Bug 1508346 - [CRI-O] Restart cri-o service encounter failure
Summary: [CRI-O] Restart cri-o service encounter failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.8.0
Assignee: Mrunal Patel
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-01 09:35 UTC by DeShuai Ma
Modified: 2018-03-28 14:09 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-28 14:08:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:09:21 UTC

Description DeShuai Ma 2017-11-01 09:35:54 UTC
Description of problem:
In some of node, when restart cri-o (system container), it always failed.

Version-Release number of selected component (if applicable):
# ./rootfs/usr/bin/crio --version
crio version 1.0.2
commit: "29077fa6fbd85f0ca9c453ab1bf1ff7b02bc3f5c"

openshift v3.7.0-0.188.0
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8
OS: rhel-7.4

How reproducible:
In some env

Steps to Reproduce:
[root@ip-172-18-3-194 netns]# systemctl restart cri-o
Job for cri-o.service failed because the control process exited with error code. See "systemctl status cri-o.service" and "journalctl -xe" for details.

//error log
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal systemd[1]: Starting crio daemon...
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.925081160-04:00" level=debug msg="[graphdriver] trying provided driver "overlay""
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.925218549-04:00" level=debug msg="overlay: overide_kernelcheck=1"
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.928550627-04:00" level=warning msg="Using pre-4.0.0 kernel for overlay, mount failures may require kernel update"
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.938039155-04:00" level=debug msg="backingFs=xfs,  projectQuotaSupported=false"
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.957034087-04:00" level=warning msg="hooks path: "/usr/share/containers/oci/hooks.d" does not exist"
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.957098896-04:00" level=warning msg="hooks path: "/etc/containers/oci/hooks.d" does not exist"
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.957431126-04:00" level=info msg="CNI network openshift-sdn (type=openshift-sdn) is used from /etc/cni/net.d/80-openshift-network.conf"
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.957644895-04:00" level=info msg="CNI network openshift-sdn (type=openshift-sdn) is used from /etc/cni/net.d/80-openshift-network.conf"
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.978210108-04:00" level=debug msg="seccomp status: true"
Nov 01 05:01:02 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:02.979348865-04:00" level=debug msg="Golang's threads limit set to 52290"
Nov 01 05:01:06 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:06.742432894-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:10 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:10.500897129-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:14 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:14.261668911-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:18 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:18.032184561-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:21 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:21.771947837-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:25 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:25.510992870-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:29 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:29.251881146-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:32 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:32.996297655-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:36 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:36.737923757-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:40 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:40.477217623-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:44 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:44.220704877-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:48 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:47.998946313-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:51 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:51.738983656-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:55 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:55.487455974-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:01:59 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:01:59.239892295-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:02:03 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:03.039240830-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:02:06 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:06.802948555-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:02:10 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:10.548915463-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:02:14 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:14.438891402-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:02:18 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:18.185889009-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:02:21 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:21.927896222-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:02:25 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:25.683905109-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:02:29 ip-172-18-3-194.ec2.internal runc[87465]: time="2017-11-01 05:02:29.472924623-04:00" level=warning msg="failed to find container exit file: timed out waiting for the condition"
Nov 01 05:02:32 ip-172-18-3-194.ec2.internal systemd[1]: cri-o.service start operation timed out. Terminating.
Nov 01 05:02:33 ip-172-18-3-194.ec2.internal systemd[1]: cri-o.service: main process exited, code=exited, status=143/n/a
Nov 01 05:02:33 ip-172-18-3-194.ec2.internal systemd[1]: Failed to start crio daemon.
Nov 01 05:02:33 ip-172-18-3-194.ec2.internal systemd[1]: Unit cri-o.service entered failed state.
Nov 01 05:02:33 ip-172-18-3-194.ec2.internal systemd[1]: cri-o.service failed.

Actual results:


Expected results:


Additional info:

Comment 6 Giuseppe Scrivano 2017-11-20 15:19:21 UTC
I could not see this error here but I've opened a PR that sets the timeout to infinity as the contrib/systemd/crio.service file already does:

https://github.com/projectatomic/atomic-system-containers/pull/148

Comment 7 Antonio Murdaca 2017-11-20 15:21:50 UTC
DeShuai could you re-test this once once we have system containers built

Comment 8 Giuseppe Scrivano 2017-11-20 15:25:23 UTC
I am rebuilding gscrivano/cri-o-centos right now.  It should be ready in few minutes.

Comment 9 DeShuai Ma 2017-11-21 02:13:17 UTC
I'll re-test it

Comment 11 DeShuai Ma 2018-01-04 08:36:03 UTC
Verify on ocp-3.9
# openshift version
openshift v3.9.0-0.16.0
kubernetes v1.9.0-beta1
etcd 3.2.8

# cd /var/lib/containers/atomic/cri-o.0/
# ./rootfs/usr/bin/crio --version
crio version 1.8.2

Now when restart cri-o no this error. 'systemctl restart cri-o' can be success

Comment 14 errata-xmlrpc 2018-03-28 14:08:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.