Bug 1774184
Summary: | [conmon] Liveness probes timeout unexpectedly | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Brian Jarvis <bjarvis> |
Component: | Node | Assignee: | Jindrich Novy <jnovy> |
Status: | CLOSED ERRATA | QA Contact: | Weinan Liu <weinliu> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.11.0 | CC: | aos-bugs, dapark, dornelas, dwalsh, eparis, jnovy, jokerman, minmli, mpatel, nagrawal, pehunt, tsweeney, weinliu |
Target Milestone: | --- | ||
Target Release: | 3.11.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | conmon-2.0.8-1.el7.x86_64 cri-o-1.11.16-0.9.dev.rhaos3.11.git6d43aae.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-28 05:44:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1186913 |
Description
Brian Jarvis
2019-11-19 19:03:11 UTC
Peter another one for the 4.3 deadline. the issue is in conmon, I've opened a PR here: https://github.com/containers/conmon/pull/95 conmon-2.0.6 containing this fix is now built for rhaos-4.3-rhel-8. check with version: 4.3.0-0.nightly-2019-12-26-101933 $ oc describe pod test-timeout-1-nq2qb ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/test-timeout-1-nq2qb to ip-10-0-157-73.ap-northeast-1.compute.internal Normal Pulling 2m29s kubelet, ip-10-0-157-73.ap-northeast-1.compute.internal Pulling image "busybox" Normal Pulled 2m21s kubelet, ip-10-0-157-73.ap-northeast-1.compute.internal Successfully pulled image "busybox" Normal Created 2m21s kubelet, ip-10-0-157-73.ap-northeast-1.compute.internal Created container test-timeout Normal Started 2m21s kubelet, ip-10-0-157-73.ap-northeast-1.compute.internal Started container test-timeout Warning Unhealthy 4s (x13 over 2m4s) kubelet, ip-10-0-157-73.ap-northeast-1.compute.internal Liveness probe errored: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 the container is not being restarted even through the liveness is reporting as failing. And if Setting the timeoutSeconds to above 60 seconds, the probe will not report failures. Opened https://github.com/cri-o/cri-o/pull/3065 for fixing the issue of timeouts not resulting in restarts. I'm setting the target release and current issue as 3.11.z, as we have fixed it in the 4.x series, and the only remaining open portions of this bug are in 4.x Fixed on version below. $oc version oc v3.11.219 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://juzhao-311master-etcd-nfs-1:8443 openshift v3.11.219 kubernetes v1.11.0+d4cacc0 Using manifest [1] in the description, no such issue waiting 6 min oc describe po test-timeout-1-ss8c5 ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 6m default-scheduler Successfully assigned default/test-timeout-1-ss8c5 to juzhao-311node-2 Normal Pulling 6m kubelet, juzhao-311node-2 pulling image "busybox" Normal Pulled 6m kubelet, juzhao-311node-2 Successfully pulled image "busybox" Normal Created 6m kubelet, juzhao-311node-2 Created container Normal Started 6m kubelet, juzhao-311node-2 Started container ... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2215 |