Bug 1826896
Summary: | [4.4] run crio-wipe on reboots to solve error reserving pod name across reboots | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ryan Phillips <rphillips> |
Component: | Node | Assignee: | Peter Hunt <pehunt> |
Status: | CLOSED ERRATA | QA Contact: | MinLi <minmli> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 4.4 | CC: | aos-bugs, jhou, jokerman, kgarriso, mpatel, pehunt, schoudha, scuppett, wking, xtian |
Target Milestone: | --- | ||
Target Release: | 4.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1826895 | Environment: | |
Last Closed: | 2020-05-04 11:50:00 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1826895 | ||
Bug Blocks: |
Description
Ryan Phillips
2020-04-22 17:41:13 UTC
PR to fix in 4.4 linked (machine config operator will need another, small fix) This can be verified as follows: Upgrade any node with the new version of CRI-O and MCO check `journalctl -u crio-wipe` and verify the string 'wiping containers' is there, hopefully followed by a list of containers that were wiped. Also doing `crictl pods` should result in only pods that are ready, and newer than `uptime` (all pods created before node reboot were wiped) s/Upgrade/Reboot/g in the above comment. upgrading the node to that version will involve a reboot, but the real test is to see if containers are wiped without an upgrade and only on reboot Crio PR merged waiting for MCO PR. We need the following: 1. code merged to upstream - DONE 2. RPM built in brew - peter will ask lokesh to do this 3. ART makes a puddle - they pinged in slack 4. RHCOS pulls RPMs from puddle - on a timer doesn't need a person After those 4 are done the MCO PR can pass tests and merge. 1. crio code merged to upstream (https://github.com/cri-o/cri-o/pull/3635) - DONE 2. RPM built in brew (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1177645) - DONE 3. ART signs/pulls into puddle - in progress - luke 4. RHCOS pulls RPMs from puddle - on a timer doesn't need a person 5. Installer PR: ashcrow/mrunal 6. MCO PR merges (https://github.com/openshift/machine-config-operator/pull/1679) 7. QE verifies this BZ 1. crio code merged to upstream (https://github.com/cri-o/cri-o/pull/3635) - DONE 2. RPM built in brew (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1177645) - DONE 3. ART signs/pulls into puddle - DONE 4. RHCOS pulls RPMs from puddle - on a timer doesn't need a person 5. Installer PR: ashcrow/mrunal 6. MCO PR merges (https://github.com/openshift/machine-config-operator/pull/1679) 7. QE verifies this BZ 1. crio code merged to upstream (https://github.com/cri-o/cri-o/pull/3635) - DONE 2. RPM built in brew (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1177645) - DONE 3. ART signs/pulls into puddle - DONE 4. RHCOS pulls RPMs from puddle - DONE 5. Installer PR: https://github.com/openshift/installer/pull/3508 6. MCO PR merges (https://github.com/openshift/machine-config-operator/pull/1679) 7. QE verifies this BZ verified with version : 4.4.0-0.nightly-2020-04-26-205915 sh-4.4# crictl version Version: 0.1.0 RuntimeName: cri-o RuntimeVersion: 1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8 RuntimeApiVersion: v1alpha1 $ oc adm release info --commit-urls | grep machine-config-operator machine-config-operator https://github.com/openshift/machine-config-operator/commit/c83f295e07d1cfd5c3124dc140bcdb10f6e094ae (pr#1679 merged) after worker node reboot, check logs as follows: sh-4.4# journalctl -u crio-wipe | grep -i "wiping containers" Apr 27 07:02:13 ip-10-0-165-102 crio[1163]: time="2020-04-27 07:02:13.934775271Z" level=info msg="wiping containers" Apr 27 07:02:13 ip-10-0-165-102 crio[1163]: time="2020-04-27 07:02:13.950494123Z" level=info msg="wiping containers" Apr 27 07:02:13 ip-10-0-165-102 crio[1163]: time="2020-04-27 07:02:13.963384553Z" level=info msg="wiping containers" sh-4.4# crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT de3b1d5ee6597 About a minute ago Ready ip-10-0-165-102us-east-2computeinternal-debug default 0 72a40a58315fd 2 minutes ago Ready node-ca-wp874 openshift-image-registry 0 0db1c6909c49c 2 minutes ago Ready multus-5f9gs openshift-multus 0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |