Bug 1702655

Summary: Machine-config-server container is in unknown status after the cluster reboota
Product: OpenShift Container Platform Reporter: weiwei jiang <wjiang>
Component: ContainersAssignee: Urvashi Mohnani <umohnani>
Status: CLOSED ERRATA QA Contact: Sunil Choudhary <schoudha>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: aos-bugs, dwalsh, jokerman, mmccomas, mpatel, schoudha, umohnani
Target Milestone: ---Keywords: BetaBlocker
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:47:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description weiwei jiang 2019-04-24 11:34:58 UTC
Description of problem:
After cluster reboot, machine-config-server container is in unknown status.

Version-Release number of selected component (if applicable):
4.1.0-0.nightly-2019-04-22-005054

How reproducible:
sometimes

Steps to Reproduce:
1. Reboot the cluster after install succeed
2. Check if the machine-config-server container come back to serve
3.

Actual results:
[root@dell-r730-063 /]# crictl ps -a --no-trunc|grep -i machine-config-server 
3bd31fd3352afe1fdacafcf193c8b941794b1b35cafdf957e9cd2dc4150f2100   8b076e9dd38656b6f3717935aaa3078f8aea4f24418d7171744c4b8d4cafae35                                                      11 minutes ago      Unknown             machine-config-server                    3                   6f01d22d2d24d
[root@dell-r730-063 /]# crictl inspect 3bd31fd3352af 
{
  "status": {
    "id": "3bd31fd3352afe1fdacafcf193c8b941794b1b35cafdf957e9cd2dc4150f2100",
    "metadata": {
      "attempt": 3,
      "name": "machine-config-server"
    },
    "state": "CONTAINER_UNKNOWN",
    "createdAt": "1970-01-01T00:00:00Z",
    "startedAt": "1970-01-01T00:00:00Z",
    "finishedAt": "1970-01-01T00:00:00Z",
    "exitCode": 0,
    "image": {
      "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bdc2a21097fbbc233722a9fb3d7f4e7bb051d3df1ac62c761b22f7673f1a6d53"
    },
    "imageRef": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bdc2a21097fbbc233722a9fb3d7f4e7bb051d3df1ac62c761b22f7673f1a6d53",
    "reason": "",
    "message": "",
    "labels": {
      "io.kubernetes.container.name": "machine-config-server",
      "io.kubernetes.pod.name": "machine-config-server-r6flt",
      "io.kubernetes.pod.namespace": "openshift-machine-config-operator",
      "io.kubernetes.pod.uid": "c59def1e-65b2-11e9-bba5-801844ef10ac"
    },
    "annotations": {
      "io.kubernetes.container.hash": "6e6221ed",
      "io.kubernetes.container.restartCount": "3",
      "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
      "io.kubernetes.container.terminationMessagePolicy": "File",
      "io.kubernetes.pod.terminationGracePeriod": "30"
    },
    "mounts": [],
    "logPath": "/var/log/pods/c59def1e-65b2-11e9-bba5-801844ef10ac/machine-config-server/3.log"
  },
  "pid": 0,
  "sandboxId": "6f01d22d2d24d78af94098b679d4cdeeca775977456232fcf2d32bf5ba61d0c3"
}


Expected results:
machine-config-server should work 

Additional info:

Comment 1 Urvashi Mohnani 2019-04-24 13:59:19 UTC
Can you please paste the cri-o logs when you hit this? (journalctl -u crio).

Comment 2 Urvashi Mohnani 2019-04-24 14:02:03 UTC
Or could I get access to that cluster if possible.

Comment 3 Mrunal Patel 2019-04-25 02:08:19 UTC
https://github.com/cri-o/cri-o/pull/2285 for a fix.

Comment 4 Urvashi Mohnani 2019-05-01 08:47:20 UTC
Fix is in the new cri-o v1.13.7 build https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=21395072

Comment 7 errata-xmlrpc 2019-06-04 10:47:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758