Description of problem: I see lot of message when debug is not enabled on the crio runtime compute nodes, see fillowing Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33015]: conmon <ninfo>: about to waitpid: 33016 Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33037]: conmon <ninfo>: about to waitpid: 33038 Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33015]: conmon <ninfo>: container PID: 33030 Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33037]: conmon <ninfo>: container PID: 33048 Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal crio[72942]: time="2018-06-23 09:02:46.962863944Z" level=info msg="Received container exit code: 0, message: " Jun 23 09:02:46 ip-172-31-58-199.us-west-2.compute.internal conmon[33037]: conmon <ninfo>: container 33048 exited with status 0 Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33067]: conmon <ninfo>: about to waitpid: 33068 Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33067]: conmon <ninfo>: container PID: 33077 Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33092]: conmon <ninfo>: about to waitpid: 33094 Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33067]: conmon <ninfo>: container 33077 exited with status 0 Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal crio[72942]: time="2018-06-23 09:02:47.144350827Z" level=info msg="Received container exit code: 0, message: " Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33107]: conmon <ninfo>: about to waitpid: 33109 Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33092]: conmon <ninfo>: container PID: 33111 Jun 23 09:02:47 ip-172-31-58-199.us-west-2.compute.internal conmon[33092]: conmon <ninfo>: container 33111 exited with status 0 Here are the counts in last 5 days root@ip-172-31-58-199: ~ # cat logs | grep "Received container exit code:" | wc -l 1180883 root@ip-172-31-58-199: ~ # cat logs | grep "about to waitpid:" | wc -l 1198134 Version-Release number of selected component (if applicable): openshift v3.10.1 crio version 1.10.3 commit: "54383a69903e1b46526be3dbe0bea71b14f00e46-dirty" Steps to Reproduce: 1. create a cluster with crio-runtime 2. create bunch of quickstart apps and watch jounal logs Actual results: Lot of journal messages Expected results: These messages should be only printed in debug mode Additional info:
I see following message also very frequently, these look like real error Jun 25 16:05:21 ip-172-31-58-199.us-west-2.compute.internal conmon[19377]: conmon <ninfo>: Timed out, killing main loop Jun 25 16:05:21 ip-172-31-58-199.us-west-2.compute.internal conmon[19377]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable root@ip-172-31-58-199: ~ # cat logs | grep -i "Timed out, killing main loop" | wc -l 268397
Vikas, Are you seeing the "Resource temporarily unavailable" after the timeout messages? Or on container exit? Are they also visible besides these conditions?
Here are how logs in the journal, I can upload full logs somewhere if you need. Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19528]: conmon <ninfo>: about to waitpid: 19529 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19528]: conmon <ninfo>: container PID: 19539 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19528]: conmon <ninfo>: container 19539 exited with status 0 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:22.384358047Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19434]: conmon <ninfo>: Timed out, killing main loop Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19434]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:22.611556840Z" level=info msg="Received container exit code: -1, message: command timed out" Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19434]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: E0625 16:05:22.612321 73875 remote_runtime.go:332] ExecSync 6a631bab49d3049d62fdd7bdd183ddaddab67788560cb979b14e6b81302ec79f '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <ninfo>: about to waitpid: 19564 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <ninfo>: container PID: 19573 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19586]: conmon <ninfo>: about to waitpid: 19587 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19586]: conmon <ninfo>: container PID: 19598 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19613]: conmon <ninfo>: about to waitpid: 19614 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19586]: conmon <ninfo>: container 19598 exited with status 0 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:22.899224776Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19613]: conmon <ninfo>: container PID: 19623 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal conmon[19613]: conmon <ninfo>: container 19623 exited with status 0 Jun 25 16:05:22 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:22.983593083Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19642]: conmon <ninfo>: about to waitpid: 19643 Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19642]: conmon <ninfo>: container PID: 19652 Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19642]: conmon <ninfo>: container 19652 exited with status 0 Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:23.284091437Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19671]: conmon <ninfo>: about to waitpid: 19672 Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <ninfo>: Timed out, killing main loop Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19563]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:23.613880710Z" level=info msg="Received container exit code: -1, message: command timed out" Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: E0625 16:05:23.615391 73875 remote_runtime.go:332] ExecSync 6a631bab49d3049d62fdd7bdd183ddaddab67788560cb979b14e6b81302ec79f '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19671]: conmon <ninfo>: container PID: 19681 Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: I0625 16:05:23.615533 73875 prober.go:106] Liveness probe for "eap-app-5-qs9pw_eap64-mysql-s2i-u-1842-661-1943-698(6319e54a-7890-11e8-8751-0248c18cc060):eap-app" errored: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal conmon[19671]: conmon <ninfo>: container 19681 exited with status 0 Jun 25 16:05:23 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:23.654058786Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19696]: conmon <ninfo>: about to waitpid: 19697 Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19696]: conmon <ninfo>: container PID: 19706 Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19721]: conmon <ninfo>: about to waitpid: 19722 Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19696]: conmon <ninfo>: container 19706 exited with status 0 Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:24.261555626Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19721]: conmon <ninfo>: container PID: 19731 Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19721]: conmon <ninfo>: container 19731 exited with status 0 Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:24.282839261Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19748]: conmon <ninfo>: about to waitpid: 19749 Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19748]: conmon <ninfo>: container PID: 19758 Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal conmon[19748]: conmon <ninfo>: container 19758 exited with status 0 Jun 25 16:05:24 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:24.679368086Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19774]: conmon <ninfo>: about to waitpid: 19775 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19774]: conmon <ninfo>: container PID: 19784 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19774]: conmon <ninfo>: container 19784 exited with status 0 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.363138300Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19802]: conmon <ninfo>: about to waitpid: 19803 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19802]: conmon <ninfo>: container PID: 19812 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19802]: conmon <ninfo>: container 19812 exited with status 0 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.418300765Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19829]: conmon <ninfo>: about to waitpid: 19830 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19829]: conmon <ninfo>: container PID: 19839 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19829]: conmon <ninfo>: container 19839 exited with status 0 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.577276172Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19854]: conmon <ninfo>: about to waitpid: 19855 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19854]: conmon <ninfo>: container PID: 19864 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19854]: conmon <ninfo>: container 19864 exited with status 0 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.715156422Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19879]: conmon <ninfo>: about to waitpid: 19880 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19879]: conmon <ninfo>: container PID: 19889 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal conmon[19879]: conmon <ninfo>: container 19889 exited with status 0 Jun 25 16:05:25 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:25.861091936Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19907]: conmon <ninfo>: about to waitpid: 19908 Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19907]: conmon <ninfo>: container PID: 19917 Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19907]: conmon <ninfo>: container 19917 exited with status 0 Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:26.480363844Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <ninfo>: about to waitpid: 19937 Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <ninfo>: container PID: 19946 Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19958]: conmon <ninfo>: about to waitpid: 19959 Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19958]: conmon <ninfo>: container PID: 19968 Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal conmon[19958]: conmon <ninfo>: container 19968 exited with status 0 Jun 25 16:05:26 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:26.664402398Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <ninfo>: Timed out, killing main loop Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: E0625 16:05:27.611964 73875 remote_runtime.go:332] ExecSync 8464aff854057895ea972cf46db1ca756678c2c38442c9c96a24bd246d43c166 '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:27.611090480Z" level=info msg="Received container exit code: -1, message: command timed out" Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19936]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <ninfo>: about to waitpid: 19987 Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <ninfo>: container PID: 19996 Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20009]: conmon <ninfo>: about to waitpid: 20010 Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20009]: conmon <ninfo>: container PID: 20019 Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20009]: conmon <ninfo>: container 20019 exited with status 0 Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:27.871079831Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20037]: conmon <ninfo>: about to waitpid: 20038 Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20037]: conmon <ninfo>: container PID: 20047 Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal conmon[20037]: conmon <ninfo>: container 20047 exited with status 0 Jun 25 16:05:27 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:27.944808227Z" level=info msg="Received container exit code: 0, message: " Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <ninfo>: Timed out, killing main loop Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[19986]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:28.612917564Z" level=info msg="Received container exit code: -1, message: command timed out" Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal atomic-openshift-node[73875]: E0625 16:05:28.613660 73875 remote_runtime.go:332] ExecSync 8464aff854057895ea972cf46db1ca756678c2c38442c9c96a24bd246d43c166 '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[20067]: conmon <ninfo>: about to waitpid: 20068 Jun 25 16:05:28 ip-172-31-58-199.us-west-2.compute.internal conmon[20067]: conmon <ninfo>: container PID: 20077 Jun 25 16:05:29 ip-172-31-58-199.us-west-2.compute.internal conmon[20089]: conmon <ninfo>: about to waitpid: 20090 Jun 25 16:05:29 ip-172-31-58-199.us-west-2.compute.internal conmon[20089]: conmon <ninfo>: container PID: 20099 Jun 25 16:05:29 ip-172-31-58-199.us-west-2.compute.internal conmon[20089]: conmon <ninfo>: container 20099 exited with status 0 Jun 25 16:05:29 ip-172-31-58-199.us-west-2.compute.internal crio[72189]: time="2018-06-25 16:05:29.217585719Z" level=info msg="Received container exit code: 0, message: "
Log issue and drain issue being fixed in https://github.com/kubernetes-incubator/cri-o/pull/1648.
Fix is in crio 1.10.4
I still see following errors in journal logs, i have cri-o 1.10.4-1.rhaos3.10.gitebaa77a.el7 installed Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28185]: conmon <ninfo>: container 28196 exited with status 0 Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:03.016658661Z" level=info msg="Received container exit code: 0, message: " Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28214]: conmon <ninfo>: about to waitpid: 28215 Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28214]: conmon <ninfo>: container PID: 28224 Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28214]: conmon <ninfo>: container 28224 exited with status 0 Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:03.179715627Z" level=info msg="Received container exit code: 0, message: " Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28163]: conmon <ninfo>: Timed out, killing main loop Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28163]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:03.942708167Z" level=info msg="Received container exit code: -1, message: command timed out" Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal conmon[28163]: conmon <nwarn>: stdio_input read failed Resource temporarily unavailable Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal atomic-openshift-node[16620]: E0629 23:02:03.943223 16620 remote_runtime.go:332] ExecSync a433aecc2bc7a1aa4c16162364b6f48818db092057711648686dc74b66ecdfbb '/bin/bash -c /opt/eap/bin/livenessProbe.sh' from runtime service failed: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 Jun 29 23:02:03 ip-172-31-24-55.us-west-2.compute.internal atomic-openshift-node[16620]: I0629 23:02:03.943310 16620 prober.go:106] Liveness probe for "eap-app-3-4nfp9_eap64-mysql-s2i-u-2-2-67-26(742814d5-7bee-11e8-9beb-0223e9428e08):eap-app" errored: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28244]: conmon <ninfo>: about to waitpid: 28245 Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28244]: conmon <ninfo>: container PID: 28254 Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28244]: conmon <ninfo>: container 28254 exited with status 0 Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:05.605255183Z" level=info msg="Received container exit code: 0, message: " Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28268]: conmon <ninfo>: about to waitpid: 28269 Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28268]: conmon <ninfo>: container PID: 28278 Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal conmon[28268]: conmon <ninfo>: container 28278 exited with status 0 Jun 29 23:02:05 ip-172-31-24-55.us-west-2.compute.internal crio[13530]: time="2018-06-29 23:02:05.690385238Z" level=info msg="Received container exit code: 0, message: " Jun 29 23:02:08 ip-172-31-24-55.us-west-2.compute.internal conmon[28298]: conmon <ninfo>: about to waitpid: 28299 Jun 29 23:02:08 ip-172-31-24-55.us-west-2.compute.internal conmon[28298]: conmon <ninfo>: container PID: 28308
We missed the necessary commit in the v1.10.4 tag by mistake. I am tagging a v1.10.5 for crio with the fix.
https://github.com/kubernetes-incubator/cri-o/releases/tag/v1.10.5 has been released. We will have new builds for cri-o 1.10.5 soon.
crio-1.10.5 is available in 3.10.12-1
I still see following in logs in cri-o-1.10.5-1.rhaos3.10.git4c81456.el7.x86_64 Jul 05 14:07:46 ip-172-31-24-55.us-west-2.compute.internal conmon[86487]: conmon <ninfo>: container PID: 86497 Jul 05 14:07:46 ip-172-31-24-55.us-west-2.compute.internal conmon[86516]: conmon <ninfo>: container PID: 86526 Jul 05 14:07:47 ip-172-31-24-55.us-west-2.compute.internal conmon[86544]: conmon <ninfo>: container PID: 86554 Jul 05 14:07:47 ip-172-31-24-55.us-west-2.compute.internal conmon[86574]: conmon <ninfo>: container PID: 86584 Jul 05 14:07:48 ip-172-31-24-55.us-west-2.compute.internal conmon[86605]: conmon <ninfo>: container PID: 86615 Jul 05 14:07:48 ip-172-31-24-55.us-west-2.compute.internal conmon[86629]: conmon <ninfo>: container PID: 86639 Jul 05 14:07:48 ip-172-31-24-55.us-west-2.compute.internal conmon[86654]: conmon <ninfo>: container PID: 86664 Jul 05 14:07:49 ip-172-31-24-55.us-west-2.compute.internal conmon[86679]: conmon <ninfo>: container PID: 86689 Jul 05 14:07:49 ip-172-31-24-55.us-west-2.compute.internal conmon[86708]: conmon <ninfo>: container PID: 86718
Vikas, I believe we left the "container PID" logs as it would be helpful in debugging future issues. Mrunal, should we leave the container PID logs, or should I go ahead and remove them as well?
Spoke with Mrunal, we want to retain the container PID logs. Moving this back to ON_QA.
Here is the frequency of these errors # journalctl | grep 'conmon <ninfo>: container PID' | wc -l 398638 If we are OK with this frequency without increasing loglevel I will make this bz verified. I think this should be printed at debug level or when loglevel is increased.
these messages were printed in less then 2 days in a crio cluster on compute node.
Vikas, we are removing this log in https://github.com/kubernetes-incubator/cri-o/pull/1692.
Please set it ON_QA again when its available in puddle.
Fixed in https://github.com/kubernetes-incubator/cri-o/pull/1692. Vikas please try it out.
Verified that there is no log spam: # cat crio-logs.log | grep 'conmon <ninfo>: container PID' | wc -l 0 # cat crio-logs.log | grep -i "Timed out, killing main loop" | wc -l 0 # cat crio-logs.log | grep "about to waitpid:" | wc -l 0 # cat crio-logs.log | grep "Received container exit code:" | wc -l 0 Verified version info: openshift v3.11.0-0.17.0 kubernetes v1.11.0+d4cacc0 oc v3.11.0-0.14.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652