Bug 1943539
Summary: | crio-wipe is failing to start "Failed to shutdown storage before wiping: A layer is mounted: layer is in use by a container" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Stephen Benjamin <stbenjam> |
Component: | Node | Assignee: | Peter Hunt <pehunt> |
Node sub component: | CRI-O | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | unspecified | CC: | aos-bugs, augol, dwalsh, jokerman, nyehia, pehunt, pmuller, schoudha, skrenger, smiron, sunil.choudhary, wking |
Version: | 4.8 | Flags: | smiron:
needinfo-
|
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:55:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1930248 |
Description
Stephen Benjamin
2021-03-26 11:43:17 UTC
this will be fixed in the attached PR fix is in 48.83.202103262224 oops, we'll also need this PR new fix is in 48.83.202103301119-0 the aforementioned fixes don't cover all cases. there is still a race with crio-wipe and podman containers run on system boot. To fix these, we'll need a new PR in MCO Was this bug introduced in 4.8 or exists in previous versions? if it's the latter, do we have backport plans? it was introduced in 4.8, no backports needed! Hi, i was able to reproduce the issue, and this time also attached the must gather with the --node-name=worker-0-0: must-gather: http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/BZ1943539-new-must-gather.tar.gz [kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 Red Hat Enterprise Linux CoreOS 48.84.202106032218-0 Part of OpenShift 4.8, RHCOS is a Kubernetes native operating system managed by the Machine Config Operator (`clusteroperator/machine-config`). WARNING: Direct SSH access to machines is not recommended; instead, make configuration changes via `machineconfig` objects: https://docs.openshift.com/container-platform/4.8/architecture/architecture-rhcos.html --- Last login: Mon Jun 7 12:00:40 2021 from 192.168.123.145 [systemd] Failed Units: 10 crio-wipe.service kubelet-auto-node-size.service node-valid-hostname.service sssd.service systemd-coredump systemd-coredump systemd-coredump systemd-coredump systemd-coredump systemd-hostnamed.service [kni@provisionhost-0-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION master-0-0 Ready master 5h31m v1.21.0-rc.0+2dfc46b master-0-1 Ready master 5h31m v1.21.0-rc.0+2dfc46b master-0-2 Ready master 5h31m v1.21.0-rc.0+2dfc46b worker-0-0 NotReady,SchedulingDisabled worker 5h1m v1.21.0-rc.0+2dfc46b worker-0-1 Ready worker 5h1m v1.21.0-rc.0+2dfc46b worker-0-2 Ready worker 5h1m v1.21.0-rc.0+2dfc46b worker-0-3 Ready worker 5h1m v1.21.0-rc.0+2dfc46b core@localhost ~]$ sudo systemctl status crio-wipe.service ● crio-wipe.service - CRI-O Auto Update Script Loaded: loaded (/usr/lib/systemd/system/crio-wipe.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2021-06-07 12:20:54 UTC; 28min ago Main PID: 1102 (code=exited, status=1/FAILURE) CPU: 116ms Jun 07 12:20:54 localhost systemd[1]: Starting CRI-O Auto Update Script... Jun 07 12:20:54 localhost crio[1102]: time="2021-06-07 12:20:54.822651586Z" level=info msg="Starting CRI-O, version: 1.21.1-2.rhaos4.8.gitf635341.el8, git: ()" Jun 07 12:20:54 localhost crio[1102]: time="2021-06-07 12:20:54.831052663Z" level=fatal msg="Mkdir /var/lib/containers/storage/overlay/compat738689383: no space lef> Jun 07 12:20:54 localhost systemd[1]: crio-wipe.service: Main process exited, code=exited, status=1/FAILURE Jun 07 12:20:54 localhost systemd[1]: crio-wipe.service: Failed with result 'exit-code'. Jun 07 12:20:54 localhost systemd[1]: Failed to start CRI-O Auto Update Script. Jun 07 12:20:54 localhost systemd[1]: crio-wipe.service: Consumed 116ms CPU time ``` Jun 07 12:20:54 localhost crio[1102]: time="2021-06-07 12:20:54.831052663Z" level=fatal msg="Mkdir /var/lib/containers/storage/overlay/compat738689383: no space lef> ``` IDK but this looks unlrelated to the bug here, and just looks like you've run out of disk space ok, so moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |