Bug 1393830

Summary: Openshift node in "NodeStatusUnknow" state
Product: OpenShift Container Platform Reporter: Miheer Salunke <misalunk>
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED INSUFFICIENT_DATA QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: high    
Version: 3.2.1CC: aos-bugs, erich, imcleod, jokerman, misalunk, mmccomas
Target Milestone: ---Keywords: Unconfirmed
Target Release: ---Flags: jhonce: needinfo? (misalunk)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-03 20:57:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Comment 3 Miheer Salunke 2016-11-21 16:57:43 UTC
Description of problem:

We monitor our production cluster by tracking nodes states with "oc get nodes" command (every 5 minutes). 
If a node state is not ready an alarm is triggered.
This case was opened after this kind of alarm.
We can't reproduce the problem, you can see in this case 2 occurrences on 2 different nodes.
Before restarting services on "unknown state" node, we checked services status (systemctl status) and all seemed ok but docker was not.
We try to understand why docker stop responding (pods worked fine before on the node)...




Version-Release number of selected component (if applicable):
OCP 3.2

How reproducible:
On customer side.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


I see the following errors-

kubelet.go:2381] skipping pod synchronization - [container runtime is down]

and


Also I saw a few Nov 07 10:08:49 host.x-y.e.t.s atomic-openshift-node[91586]: E1107 10:08:49.068754   91586 container_manager_linux.go:267] failed to detect process id for "docker" - failed to find pid of "docker": exit status 1
errors.

Comment 4 Jhon Honce 2017-02-03 23:01:09 UTC
Is it possible to capture the docker logs during an event with the -D option on the docker daemon?  This will provide us with additional information.