Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1388867 - node service restart failed when a pod is running on this node
node service restart failed when a pod is running on this node
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking (Show other bugs)
3.4.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Dan Williams
Meng Bo
aos-scalability-34
: TestBlocker
Depends On:
Blocks: OSOPS_V3
  Show dependency treegraph
 
Reported: 2016-10-26 06:39 EDT by Johnny Liu
Modified: 2017-03-08 13 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-01-18 07:46:12 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
node start failure log (388.29 KB, text/x-vhdl)
2016-10-26 06:39 EDT, Johnny Liu
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Origin (Github) 11613 None None None 2016-11-01 10:01 EDT
Red Hat Product Errata RHBA-2017:0066 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 12:23:26 EST

  None (edit)
Description Johnny Liu 2016-10-26 06:39:23 EDT
Created attachment 1214248 [details]
node start failure log

Description of problem:
Clone this bug from https://bugzilla.redhat.com/show_bug.cgi?id=1388288#c7.

Version-Release number of selected component (if applicable):
# openshift version
openshift v3.4.0.15+9c963ec
kubernetes v1.4.0+776c994
etcd 3.1.0-alpha.1
# rpm -q docker
docker-1.10.3-57.el7.x86_64

How reproducible:
Always

Steps to Reproduce:

1. install env successfully with "redhat/openshift-ovs-multitenant"
# openshift version
openshift v3.4.0.15+9c963ec
kubernetes v1.4.0+776c994
etcd 3.1.0-alpha.1
# rpm -q docker
docker-1.10.3-57.el7.x86_64
# oc get nodes
NAME                           STATUS                     AGE
ip-172-18-10-70.ec2.internal   Ready                      1h
ip-172-18-6-3.ec2.internal     Ready,SchedulingDisabled   1h

2. make sure there is no pod running on node.
# oc scale --replicas=0 dc/registry-console

3. restart node successfully.

4. make sure there is a pod running on node.
# oc scale --replicas=1 dc/registry-console
# oc get po
NAME                       READY     STATUS    RESTARTS   AGE
registry-console-1-k2brf   1/1       Running   0          3m

5. restart node, failed.
# service atomic-openshift-node restart
Redirecting to /bin/systemctl restart  atomic-openshift-node.service
Job for atomic-openshift-node.service failed because a timeout was exceeded. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details.

Actual results:
restart node failed.

Expected results:
restart successfully.

Additional info:
Comment 1 Meng Bo 2016-10-26 07:11:18 EDT
It is not related to plugin type, the problem exists in both subnet and multitenant env.

The valuable logs from my viewpoint are:
Oct 25 08:26:01 ip-172-18-24-156.ec2.internal atomic-openshift-node[92648]: I1025 08:26:01.979679   92648 kubelet.go:2240] skipping pod synchronization - [SDN pod network is not ready]
Oct 25 08:26:31 ip-172-18-24-156.ec2.internal atomic-openshift-node[92648]: I1025 08:26:31.980867   92648 kubelet.go:2240] skipping pod synchronization - [SDN pod network is not ready]
Oct 25 08:26:36 ip-172-18-24-156.ec2.internal atomic-openshift-node[92648]: I1025 08:26:36.981065   92648 kubelet.go:2240] skipping pod synchronization - [SDN pod network is not ready]
Oct 25 08:27:02 ip-172-18-24-156.ec2.internal atomic-openshift-node[92947]: I1025 08:27:02.257550   92947 kubelet.go:2240] skipping pod synchronization - [network state unknown container runtime is down]

Seems that the node/kubelet cannot get the correct pod status or cannot bring the existing pods up after restarting.
Comment 2 Ben Bennett 2016-10-27 08:49:25 EDT
Is this a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1388556 ?
Comment 3 Dan Williams 2016-10-28 18:01:34 EDT
Any chance you get get more of the node's logs, and better yet with --loglevel=5 ?
Comment 4 Johnny Liu 2016-10-28 21:27:50 EDT
(In reply to Dan Williams from comment #3)
> Any chance you get get more of the node's logs, and better yet with
> --loglevel=5 ?

The node logs was gotten at --loglevel=5.
Comment 6 Ben Bennett 2016-11-01 10:03:53 EDT
Can't be MODIFIED until the PR is merged.
Comment 7 Troy Dawson 2016-11-04 14:50:31 EDT
This has been merged into ose and is in OSE v3.4.0.22 or newer.
Comment 9 Johnny Liu 2016-11-07 03:33:19 EST
Verified this bug with atomic-openshift-3.4.0.22-1.git.0.5c56720.el7.x86_64, and PASS.

Now re-start node successfully.
Comment 11 errata-xmlrpc 2017-01-18 07:46:12 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066

Note You need to log in before you can comment on or make changes to this bug.