Bug 1594943 - Logspam : lot of crio related debug messages in logs
Summary: Logspam : lot of crio related debug messages in logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.0
Assignee: Urvashi Mohnani
QA Contact: Vikas Laad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-25 19:34 UTC by Vikas Laad
Modified: 2018-10-11 07:21 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-11 07:20:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:21:18 UTC

Description Vikas Laad 2018-06-25 19:34:23 UTC
Description of problem:
I see lot of message when debug is not enabled on the crio runtime compute nodes, see fillowing

Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33015]: conmon <ninfo>: about to waitpid: 33016
Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33037]: conmon <ninfo>: about to waitpid: 33038
Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33015]: conmon <ninfo>: container PID: 33030
Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33037]: conmon <ninfo>: container PID: 33048
Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal crio[72942]: time="2018-06-23 09:02:46.962863944Z" level=info msg="Received container exit code: 0, message: "
Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33037]: conmon <ninfo>: container 33048 exited with status 0
Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33067]: conmon <ninfo>: about to waitpid: 33068
Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33067]: conmon <ninfo>: container PID: 33077
Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33092]: conmon <ninfo>: about to waitpid: 33094
Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33067]: conmon <ninfo>: container 33077 exited with status 0
Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal crio[72942]: time="2018-06-23 09:02:47.144350827Z" level=info msg="Received container exit code: 0, message: "
Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33107]: conmon <ninfo>: about to waitpid: 33109
Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33092]: conmon <ninfo>: container PID: 33111
Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33092]: conmon <ninfo>: container 33111 exited with status 0

Here are the counts in last 5 days

root@ip-172-31-58-199: ~ # cat logs | grep "Received container exit code:" | wc -l
1180883
root@ip-172-31-58-199: ~ # cat logs | grep "about to waitpid:" | wc -l
1198134

Version-Release number of selected component (if applicable):
openshift v3.10.1

crio version 1.10.3
commit: "54383a69903e1b46526be3dbe0bea71b14f00e46-dirty"


Steps to Reproduce:
1. create a cluster with crio-runtime
2. create bunch of quickstart apps and watch jounal logs

Actual results:
Lot of journal messages

Expected results:
These messages should be only printed in debug mode

Additional info:

Comment 1 Vikas Laad 2018-06-25 19:51:03 UTC
I see following message also very frequently, these look like real error

Jun 25 16:05:21 ip-172-31-58-199.us-west-2.compute.internal conmon[19377]: conmon <ninfo>: Timed out, killing main loop
Jun 25 16:05:21 ip-172-31-58-199.us-west-2.compute.internal conmon[19377]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable

root@ip-172-31-58-199: ~ # cat logs | grep -i "Timed out, killing main loop" | wc -l
268397

Comment 3 Mrunal Patel 2018-06-25 23:21:47 UTC
Vikas, Are you seeing the "Resource temporarily unavailable" after the timeout messages? Or on container exit? Are they also visible besides these conditions?

Comment 4 Vikas Laad 2018-06-26 02:37:07 UTC
Here are how logs in the journal, I can upload full logs somewhere if you need.

Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19528]: conmon <ninfo>: about to waitpid: 19529
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19528]: conmon <ninfo>: container PID: 19539
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19528]: conmon <ninfo>: container 19539 exited with status 0
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:22.384358047Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19434]: conmon <ninfo>: Timed out, killing main loop
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19434]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:22.611556840Z" level=info msg="Received container exit code: -1, message: command timed out"
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19434]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: E0625 16:05:22.612321   73875 remote_runtime.go:332] ExecSync 6a631bab49d3049d62fdd7bdd183ddaddab67788560cb979b14e6b81302ec79f '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <ninfo>: about to waitpid: 19564
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <ninfo>: container PID: 19573
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19586]: conmon <ninfo>: about to waitpid: 19587
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19586]: conmon <ninfo>: container PID: 19598
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19613]: conmon <ninfo>: about to waitpid: 19614
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19586]: conmon <ninfo>: container 19598 exited with status 0
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:22.899224776Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19613]: conmon <ninfo>: container PID: 19623
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19613]: conmon <ninfo>: container 19623 exited with status 0
Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:22.983593083Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19642]: conmon <ninfo>: about to waitpid: 19643
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19642]: conmon <ninfo>: container PID: 19652
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19642]: conmon <ninfo>: container 19652 exited with status 0
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:23.284091437Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19671]: conmon <ninfo>: about to waitpid: 19672
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <ninfo>: Timed out, killing main loop
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:23.613880710Z" level=info msg="Received container exit code: -1, message: command timed out"
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: E0625 16:05:23.615391   73875 remote_runtime.go:332] ExecSync 6a631bab49d3049d62fdd7bdd183ddaddab67788560cb979b14e6b81302ec79f '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19671]: conmon <ninfo>: container PID: 19681
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: I0625 16:05:23.615533   73875 prober.go:106] Liveness probe for "eap-app-5-qs9pw_eap64-mysql-s2i-u-1842-661-1943-698(6319e54a-7890-11e8-8751-0248c18cc060):eap-app" errored: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19671]: conmon <ninfo>: container 19681 exited with status 0
Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:23.654058786Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19696]: conmon <ninfo>: about to waitpid: 19697
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19696]: conmon <ninfo>: container PID: 19706
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19721]: conmon <ninfo>: about to waitpid: 19722
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19696]: conmon <ninfo>: container 19706 exited with status 0
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:24.261555626Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19721]: conmon <ninfo>: container PID: 19731
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19721]: conmon <ninfo>: container 19731 exited with status 0
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:24.282839261Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19748]: conmon <ninfo>: about to waitpid: 19749
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19748]: conmon <ninfo>: container PID: 19758
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19748]: conmon <ninfo>: container 19758 exited with status 0
Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:24.679368086Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19774]: conmon <ninfo>: about to waitpid: 19775
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19774]: conmon <ninfo>: container PID: 19784
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19774]: conmon <ninfo>: container 19784 exited with status 0
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.363138300Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19802]: conmon <ninfo>: about to waitpid: 19803
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19802]: conmon <ninfo>: container PID: 19812
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19802]: conmon <ninfo>: container 19812 exited with status 0
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.418300765Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19829]: conmon <ninfo>: about to waitpid: 19830
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19829]: conmon <ninfo>: container PID: 19839
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19829]: conmon <ninfo>: container 19839 exited with status 0
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.577276172Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19854]: conmon <ninfo>: about to waitpid: 19855
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19854]: conmon <ninfo>: container PID: 19864
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19854]: conmon <ninfo>: container 19864 exited with status 0
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.715156422Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19879]: conmon <ninfo>: about to waitpid: 19880
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19879]: conmon <ninfo>: container PID: 19889
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19879]: conmon <ninfo>: container 19889 exited with status 0
Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.861091936Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19907]: conmon <ninfo>: about to waitpid: 19908
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19907]: conmon <ninfo>: container PID: 19917
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19907]: conmon <ninfo>: container 19917 exited with status 0
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:26.480363844Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <ninfo>: about to waitpid: 19937
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <ninfo>: container PID: 19946
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19958]: conmon <ninfo>: about to waitpid: 19959
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19958]: conmon <ninfo>: container PID: 19968
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19958]: conmon <ninfo>: container 19968 exited with status 0
Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:26.664402398Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <ninfo>: Timed out, killing main loop
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: E0625 16:05:27.611964   73875 remote_runtime.go:332] ExecSync 8464aff854057895ea972cf46db1ca756678c2c38442c9c96a24bd246d43c166 '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:27.611090480Z" level=info msg="Received container exit code: -1, message: command timed out"
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <ninfo>: about to waitpid: 19987
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <ninfo>: container PID: 19996
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20009]: conmon <ninfo>: about to waitpid: 20010
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20009]: conmon <ninfo>: container PID: 20019
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20009]: conmon <ninfo>: container 20019 exited with status 0
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:27.871079831Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20037]: conmon <ninfo>: about to waitpid: 20038
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20037]: conmon <ninfo>: container PID: 20047
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20037]: conmon <ninfo>: container 20047 exited with status 0
Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:27.944808227Z" level=info msg="Received container exit code: 0, message: "
Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <ninfo>: Timed out, killing main loop
Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:28.612917564Z" level=info msg="Received container exit code: -1, message: command timed out"
Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: E0625 16:05:28.613660   73875 remote_runtime.go:332] ExecSync 8464aff854057895ea972cf46db1ca756678c2c38442c9c96a24bd246d43c166 '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1
Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[20067]: conmon <ninfo>: about to waitpid: 20068
Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[20067]: conmon <ninfo>: container PID: 20077
Jun 25 16:05:29 ip-172-31-58-199.us-west-2.compute.internal conmon[20089]: conmon <ninfo>: about to waitpid: 20090
Jun 25 16:05:29 ip-172-31-58-199.us-west-2.compute.internal conmon[20089]: conmon <ninfo>: container PID: 20099
Jun 25 16:05:29 ip-172-31-58-199.us-west-2.compute.internal conmon[20089]: conmon <ninfo>: container 20099 exited with status 0
Jun 25 16:05:29 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:29.217585719Z" level=info msg="Received container exit code: 0, message: "

Comment 5 Urvashi Mohnani 2018-06-26 15:14:35 UTC
Log issue and drain issue being fixed in https://github.com/kubernetes-incubator/cri-o/pull/1648.

Comment 6 Mrunal Patel 2018-06-27 13:56:15 UTC
Fix is in crio 1.10.4

Comment 7 Vikas Laad 2018-06-29 23:04:27 UTC
I still see following errors in journal logs, i have cri-o 1.10.4-1.rhaos3.10.gitebaa77a.el7 installed

Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28185]: conmon <ninfo>: container 28196 exited with status 0
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:03.016658661Z" level=info msg="Received container exit code: 0, message: "
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28214]: conmon <ninfo>: about to waitpid: 28215
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28214]: conmon <ninfo>: container PID: 28224
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28214]: conmon <ninfo>: container 28224 exited with status 0
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:03.179715627Z" level=info msg="Received container exit code: 0, message: "
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28163]: conmon <ninfo>: Timed out, killing main loop
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28163]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:03.942708167Z" level=info msg="Received container exit code: -1, message: command timed out"
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28163]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal atomic-openshift-node[16620]: E0629 23:02:03.943223   16620 remote_runtime.go:332] ExecSync a433aecc2bc7a1aa4c16162364b6f48818db092057711648686dc74b66ecdfbb '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1
Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal atomic-openshift-node[16620]: I0629 23:02:03.943310   16620 prober.go:106] Liveness probe for "eap-app-3-4nfp9_eap64-mysql-s2i-u-2-2-67-26(742814d5-7bee-11e8-9beb-0223e9428e08):eap-app" errored: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1
Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28244]: conmon <ninfo>: about to waitpid: 28245
Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28244]: conmon <ninfo>: container PID: 28254
Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28244]: conmon <ninfo>: container 28254 exited with status 0
Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:05.605255183Z" level=info msg="Received container exit code: 0, message: "
Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28268]: conmon <ninfo>: about to waitpid: 28269
Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28268]: conmon <ninfo>: container PID: 28278
Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28268]: conmon <ninfo>: container 28278 exited with status 0
Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:05.690385238Z" level=info msg="Received container exit code: 0, message: "
Jun 29 23:02:08 ip-172-31-24-55.us-west-2.compute.internal conmon[28298]: conmon <ninfo>: about to waitpid: 28299
Jun 29 23:02:08 ip-172-31-24-55.us-west-2.compute.internal conmon[28298]: conmon <ninfo>: container PID: 28308

Comment 8 Mrunal Patel 2018-06-29 23:39:20 UTC
We missed the necessary commit in the v1.10.4 tag by mistake. I am tagging a v1.10.5 for crio with the fix.

Comment 9 Mrunal Patel 2018-06-30 15:00:30 UTC
https://github.com/kubernetes-incubator/cri-o/releases/tag/v1.10.5 has been released. We will have new builds for cri-o 1.10.5 soon.

Comment 12 Wei Sun 2018-07-04 02:38:35 UTC
crio-1.10.5 is available in 3.10.12-1

Comment 13 Vikas Laad 2018-07-05 14:20:55 UTC
I still see following in logs in cri-o-1.10.5-1.rhaos3.10.git4c81456.el7.x86_64

Jul 05 14:07:46 ip-172-31-24-55.us-west-2.compute.internal conmon[86487]: conmon <ninfo>: container PID: 86497
Jul 05 14:07:46 ip-172-31-24-55.us-west-2.compute.internal conmon[86516]: conmon <ninfo>: container PID: 86526
Jul 05 14:07:47 ip-172-31-24-55.us-west-2.compute.internal conmon[86544]: conmon <ninfo>: container PID: 86554
Jul 05 14:07:47 ip-172-31-24-55.us-west-2.compute.internal conmon[86574]: conmon <ninfo>: container PID: 86584
Jul 05 14:07:48 ip-172-31-24-55.us-west-2.compute.internal conmon[86605]: conmon <ninfo>: container PID: 86615
Jul 05 14:07:48 ip-172-31-24-55.us-west-2.compute.internal conmon[86629]: conmon <ninfo>: container PID: 86639
Jul 05 14:07:48 ip-172-31-24-55.us-west-2.compute.internal conmon[86654]: conmon <ninfo>: container PID: 86664
Jul 05 14:07:49 ip-172-31-24-55.us-west-2.compute.internal conmon[86679]: conmon <ninfo>: container PID: 86689
Jul 05 14:07:49 ip-172-31-24-55.us-west-2.compute.internal conmon[86708]: conmon <ninfo>: container PID: 86718

Comment 14 Urvashi Mohnani 2018-07-05 16:47:35 UTC
Vikas, I believe we left the "container PID" logs as it would be helpful in debugging future issues.
Mrunal, should we leave the container PID logs, or should I go ahead and remove them as well?

Comment 15 Urvashi Mohnani 2018-07-05 17:07:26 UTC
Spoke with Mrunal, we want to retain the container PID logs. Moving this back to ON_QA.

Comment 17 Vikas Laad 2018-07-11 12:59:26 UTC
Here is the frequency of these errors

 # journalctl | grep 'conmon <ninfo>: container PID' | wc -l
398638

If we are OK with this frequency without increasing loglevel I will make this bz verified. I think this should be printed at debug level or when loglevel is increased.

Comment 18 Vikas Laad 2018-07-11 13:00:26 UTC
these messages were printed in less then 2 days in a crio cluster on compute node.

Comment 19 Urvashi Mohnani 2018-07-11 16:57:20 UTC
Vikas, we are removing this log in https://github.com/kubernetes-incubator/cri-o/pull/1692.

Comment 20 Vikas Laad 2018-07-12 18:28:55 UTC
Please set it ON_QA again when its available in puddle.

Comment 21 Urvashi Mohnani 2018-07-31 12:39:16 UTC
Fixed in https://github.com/kubernetes-incubator/cri-o/pull/1692. Vikas please try it out.

Comment 22 Siva Reddy 2018-08-21 14:09:21 UTC
Verified that there is no log spam:
# cat crio-logs.log |  grep 'conmon <ninfo>: container PID' | wc -l
0
# cat crio-logs.log | grep -i "Timed out, killing main loop" | wc -l                                                                       
0
# cat crio-logs.log | grep "about to waitpid:" | wc -l                                                                                     
0
# cat crio-logs.log | grep "Received container exit code:" | wc -l                                                                         
0

Verified version info:

openshift v3.11.0-0.17.0
kubernetes v1.11.0+d4cacc0
oc v3.11.0-0.14.0

Comment 29 errata-xmlrpc 2018-10-11 07:20:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.