Bug 1704410 - Symlinks under /var/lib/containers/storage/overlay/l are lost on reboot
Summary: Symlinks under /var/lib/containers/storage/overlay/l are lost on reboot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.1.0
Assignee: Urvashi Mohnani
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On:
Blocks: 1708442
TreeView+ depends on / blocked
 
Reported: 2019-04-29 16:30 UTC by Urvashi Mohnani
Modified: 2019-06-04 10:48 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1708442 (view as bug list)
Environment:
Last Closed: 2019-06-04 10:48:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:48:17 UTC

Description Urvashi Mohnani 2019-04-29 16:30:41 UTC
The symlinks created under `/var/lib/containers/storage/overlay/l` disappear on reboot sometimes. This causes CRI-O to hang up as it is not able to create any pods/containers due to the missing symlinks. The symlinks point to the diff directories of the layers under `/var/lib/containers/storage/overlay`.

This is what the error message looks like:


```
Apr 29 15:58:24 r640-u10.dev1.kni.lab.eng.bos.redhat.com hyperkube[4516]: E0429 15:58:24.958202    4516 pod_workers.go:190] Error syncing pod a3ad435d3c884e79f8b18c15fedef05d ("kube-controller-manager-r640-u10.dev1.kni.lab.eng.bos.redhat.com_openshift-kube-controller-manager(a3ad435d3c884e79f8b18c15fedef05d)"), skipping: failed to "CreatePodSandbox" for "kube-controller-manager-r640-u10.dev1.kni.lab.eng.bos.redhat.com_openshift-kube-controller-manager(a3ad435d3c884e79f8b18c15fedef05d)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-controller-manager-r640-u10.dev1.kni.lab.eng.bos.redhat.com_openshift-kube-controller-manager(a3ad435d3c884e79f8b18c15fedef05d)\" failed: rpc error: code = Unknown desc = error creating pod sandbox with name \"k8s_kube-controller-manager-r640-u10.dev1.kni.lab.eng.bos.redhat.com_openshift-kube-controller-manager_a3ad435d3c884e79f8b18c15fedef05d_1\": error creating read-write layer with ID \"4c9532fb1f00d098820507fdc903bbd780b6176fc9e2c6bde551e02528f37843\": symlink ../4c9532fb1f00d098820507fdc903bbd780b6176fc9e2c6bde551e02528f37843/diff /var/lib/containers/storage/overlay/l/UID7QVVCAIRUT4QI4TOYARXSC4: no such file or directory"```

Comment 1 Urvashi Mohnani 2019-04-29 16:34:31 UTC
Fix went in with https://github.com/containers/storage/pull/326. Waiting on cutting a release now

Comment 2 Urvashi Mohnani 2019-04-30 12:01:19 UTC
Additional fix is in https://github.com/containers/storage/pull/333

Comment 3 Urvashi Mohnani 2019-05-01 08:46:48 UTC
Fix is in the new cri-o v1.13.7 build https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=21395072

Comment 4 weiwei jiang 2019-05-07 09:50:25 UTC
Checked with 4.1.0-0.nightly-2019-05-05-070156, 1.13.9-1.rhaos4.1.gitd70609a.el8

Reboot about 5 times, and all nodes did not got this error, so moved to verified.

Comment 9 errata-xmlrpc 2019-06-04 10:48:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.