Hide Forgot
Description of problem: We monitor our production cluster by tracking nodes states with "oc get nodes" command (every 5 minutes). If a node state is not ready an alarm is triggered. This case was opened after this kind of alarm. We can't reproduce the problem, you can see in this case 2 occurrences on 2 different nodes. Before restarting services on "unknown state" node, we checked services status (systemctl status) and all seemed ok but docker was not. We try to understand why docker stop responding (pods worked fine before on the node)... Version-Release number of selected component (if applicable): OCP 3.2 How reproducible: On customer side. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: I see the following errors- kubelet.go:2381] skipping pod synchronization - [container runtime is down] and Also I saw a few Nov 07 10:08:49 host.x-y.e.t.s atomic-openshift-node[91586]: E1107 10:08:49.068754 91586 container_manager_linux.go:267] failed to detect process id for "docker" - failed to find pid of "docker": exit status 1 errors.
Is it possible to capture the docker logs during an event with the -D option on the docker daemon? This will provide us with additional information.