Description of problem: In some of node, when restart cri-o (system container), it always failed. Version-Release number of selected component (if applicable): # ./rootfs/usr/bin/crio --version crio version 1.0.2 commit: "29077fa6fbd85f0ca9c453ab1bf1ff7b02bc3f5c" openshift v3.7.0-0.188.0 kubernetes v1.7.6+a08f5eeb62 etcd 3.2.8 OS: rhel-7.4 How reproducible: In some env Steps to Reproduce: [root@ip-172-18-3-194 netns]# systemctl restart cri-o Job for cri-o.service failed because the control process exited with error code. See "systemctl status cri-o.service" and "journalctl -xe" for details. //error log Nov 01 05:01:02 ip-172-18-3-194.ec2.internal systemd[1]: Starting crio daemon... Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.925081160-04:00" level=debug msg="[graphdriver] trying provided driver "overlay"" Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.925218549-04:00" level=debug msg="overlay: overide_kernelcheck=1" Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.928550627-04:00" level=warning msg="Using pre-4.0.0 kernel for overlay, mount failures may require kernel update" Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.938039155-04:00" level=debug msg="backingFs=xfs, projectQuotaSupported=false" Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.957034087-04:00" level=warning msg="hooks path: "/usr/share/containers/oci/hooks.d" does not exist" Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.957098896-04:00" level=warning msg="hooks path: "/etc/containers/oci/hooks.d" does not exist" Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.957431126-04:00" level=info msg="CNI network openshift-sdn (type=openshift-sdn) is used from /etc/cni/net.d/80-openshift-network.conf" Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.957644895-04:00" level=info msg="CNI network openshift-sdn (type=openshift-sdn) is used from /etc/cni/net.d/80-openshift-network.conf" Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.978210108-04:00" level=debug msg="seccomp status: true" Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.979348865-04:00" level=debug msg="Golang's threads limit set to 52290" Nov 01 05:01:06 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:06.742432894-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:10 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:10.500897129-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:14 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:14.261668911-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:18 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:18.032184561-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:21 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:21.771947837-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:25 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:25.510992870-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:29 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:29.251881146-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:32 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:32.996297655-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:36 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:36.737923757-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:40 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:40.477217623-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:44 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:44.220704877-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:48 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:47.998946313-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:51 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:51.738983656-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:55 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:55.487455974-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:01:59 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:59.239892295-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:02:03 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:03.039240830-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:02:06 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:06.802948555-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:02:10 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:10.548915463-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:02:14 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:14.438891402-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:02:18 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:18.185889009-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:02:21 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:21.927896222-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:02:25 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:25.683905109-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:02:29 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:29.472924623-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition" Nov 01 05:02:32 ip-172-18-3-194.ec2.internal systemd[1]: cri-o.service start operation timed out. Terminating. Nov 01 05:02:33 ip-172-18-3-194.ec2.internal systemd[1]: cri-o.service: main process exited, code=exited, status=143/n/a Nov 01 05:02:33 ip-172-18-3-194.ec2.internal systemd[1]: Failed to start crio daemon. Nov 01 05:02:33 ip-172-18-3-194.ec2.internal systemd[1]: Unit cri-o.service entered failed state. Nov 01 05:02:33 ip-172-18-3-194.ec2.internal systemd[1]: cri-o.service failed. Actual results: Expected results: Additional info:
I could not see this error here but I've opened a PR that sets the timeout to infinity as the contrib/systemd/crio.service file already does: https://github.com/projectatomic/atomic-system-containers/pull/148
DeShuai could you re-test this once once we have system containers built
I am rebuilding gscrivano/cri-o-centos right now. It should be ready in few minutes.
I'll re-test it
Verify on ocp-3.9 # openshift version openshift v3.9.0-0.16.0 kubernetes v1.9.0-beta1 etcd 3.2.8 # cd /var/lib/containers/atomic/cri-o.0/ # ./rootfs/usr/bin/crio --version crio version 1.8.2 Now when restart cri-o no this error. 'systemctl restart cri-o' can be success
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489