+++ This bug was initially created as a clone of Bug #1826895 +++ Description of problem: This BZ is splitting out Casey's reported symptom of pods not starting on reboots. https://bugzilla.redhat.com/show_bug.cgi?id=1785399#c19 We have looked into the logs and are going to wipe the crio state on the reboot. Version-Release number of selected component (if applicable): 4.5 and 4.4 How reproducible: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1670/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1940/artifacts/e2e-gcp-op/ Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
PR to fix in 4.4 linked (machine config operator will need another, small fix)
This can be verified as follows: Upgrade any node with the new version of CRI-O and MCO check `journalctl -u crio-wipe` and verify the string 'wiping containers' is there, hopefully followed by a list of containers that were wiped. Also doing `crictl pods` should result in only pods that are ready, and newer than `uptime` (all pods created before node reboot were wiped)
s/Upgrade/Reboot/g in the above comment. upgrading the node to that version will involve a reboot, but the real test is to see if containers are wiped without an upgrade and only on reboot
Crio PR merged waiting for MCO PR.
We need the following: 1. code merged to upstream - DONE 2. RPM built in brew - peter will ask lokesh to do this 3. ART makes a puddle - they pinged in slack 4. RHCOS pulls RPMs from puddle - on a timer doesn't need a person
After those 4 are done the MCO PR can pass tests and merge.
1. crio code merged to upstream (https://github.com/cri-o/cri-o/pull/3635) - DONE 2. RPM built in brew (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1177645) - DONE 3. ART signs/pulls into puddle - in progress - luke 4. RHCOS pulls RPMs from puddle - on a timer doesn't need a person 5. Installer PR: ashcrow/mrunal 6. MCO PR merges (https://github.com/openshift/machine-config-operator/pull/1679) 7. QE verifies this BZ
1. crio code merged to upstream (https://github.com/cri-o/cri-o/pull/3635) - DONE 2. RPM built in brew (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1177645) - DONE 3. ART signs/pulls into puddle - DONE 4. RHCOS pulls RPMs from puddle - on a timer doesn't need a person 5. Installer PR: ashcrow/mrunal 6. MCO PR merges (https://github.com/openshift/machine-config-operator/pull/1679) 7. QE verifies this BZ
1. crio code merged to upstream (https://github.com/cri-o/cri-o/pull/3635) - DONE 2. RPM built in brew (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1177645) - DONE 3. ART signs/pulls into puddle - DONE 4. RHCOS pulls RPMs from puddle - DONE 5. Installer PR: https://github.com/openshift/installer/pull/3508 6. MCO PR merges (https://github.com/openshift/machine-config-operator/pull/1679) 7. QE verifies this BZ
verified with version : 4.4.0-0.nightly-2020-04-26-205915 sh-4.4# crictl version Version: 0.1.0 RuntimeName: cri-o RuntimeVersion: 1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8 RuntimeApiVersion: v1alpha1 $ oc adm release info --commit-urls | grep machine-config-operator machine-config-operator https://github.com/openshift/machine-config-operator/commit/c83f295e07d1cfd5c3124dc140bcdb10f6e094ae (pr#1679 merged) after worker node reboot, check logs as follows: sh-4.4# journalctl -u crio-wipe | grep -i "wiping containers" Apr 27 07:02:13 ip-10-0-165-102 crio[1163]: time="2020-04-27 07:02:13.934775271Z" level=info msg="wiping containers" Apr 27 07:02:13 ip-10-0-165-102 crio[1163]: time="2020-04-27 07:02:13.950494123Z" level=info msg="wiping containers" Apr 27 07:02:13 ip-10-0-165-102 crio[1163]: time="2020-04-27 07:02:13.963384553Z" level=info msg="wiping containers" sh-4.4# crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT de3b1d5ee6597 About a minute ago Ready ip-10-0-165-102us-east-2computeinternal-debug default 0 72a40a58315fd 2 minutes ago Ready node-ca-wp874 openshift-image-registry 0 0db1c6909c49c 2 minutes ago Ready multus-5f9gs openshift-multus 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581