Bug 1704410

Summary: Symlinks under /var/lib/containers/storage/overlay/l are lost on reboot
Product: OpenShift Container Platform Reporter: Urvashi Mohnani <umohnani>
Component: ContainersAssignee: Urvashi Mohnani <umohnani>
Status: CLOSED ERRATA QA Contact: weiwei jiang <wjiang>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: adeshpan, aos-bugs, dwalsh, eparis, jokerman, mmccomas
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1708442 (view as bug list) Environment:
Last Closed: 2019-06-04 10:48:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1708442    

Description Urvashi Mohnani 2019-04-29 16:30:41 UTC
The symlinks created under `/var/lib/containers/storage/overlay/l` disappear on reboot sometimes. This causes CRI-O to hang up as it is not able to create any pods/containers due to the missing symlinks. The symlinks point to the diff directories of the layers under `/var/lib/containers/storage/overlay`.

This is what the error message looks like:


```
Apr 29 15:58:24 r640-u10.dev1.kni.lab.eng.bos.redhat.com hyperkube[4516]: E0429 15:58:24.958202    4516 pod_workers.go:190] Error syncing pod a3ad435d3c884e79f8b18c15fedef05d ("kube-controller-manager-r640-u10.dev1.kni.lab.eng.bos.redhat.com_openshift-kube-controller-manager(a3ad435d3c884e79f8b18c15fedef05d)"), skipping: failed to "CreatePodSandbox" for "kube-controller-manager-r640-u10.dev1.kni.lab.eng.bos.redhat.com_openshift-kube-controller-manager(a3ad435d3c884e79f8b18c15fedef05d)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-controller-manager-r640-u10.dev1.kni.lab.eng.bos.redhat.com_openshift-kube-controller-manager(a3ad435d3c884e79f8b18c15fedef05d)\" failed: rpc error: code = Unknown desc = error creating pod sandbox with name \"k8s_kube-controller-manager-r640-u10.dev1.kni.lab.eng.bos.redhat.com_openshift-kube-controller-manager_a3ad435d3c884e79f8b18c15fedef05d_1\": error creating read-write layer with ID \"4c9532fb1f00d098820507fdc903bbd780b6176fc9e2c6bde551e02528f37843\": symlink ../4c9532fb1f00d098820507fdc903bbd780b6176fc9e2c6bde551e02528f37843/diff /var/lib/containers/storage/overlay/l/UID7QVVCAIRUT4QI4TOYARXSC4: no such file or directory"```

Comment 1 Urvashi Mohnani 2019-04-29 16:34:31 UTC
Fix went in with https://github.com/containers/storage/pull/326. Waiting on cutting a release now

Comment 2 Urvashi Mohnani 2019-04-30 12:01:19 UTC
Additional fix is in https://github.com/containers/storage/pull/333

Comment 3 Urvashi Mohnani 2019-05-01 08:46:48 UTC
Fix is in the new cri-o v1.13.7 build https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=21395072

Comment 4 weiwei jiang 2019-05-07 09:50:25 UTC
Checked with 4.1.0-0.nightly-2019-05-05-070156, 1.13.9-1.rhaos4.1.gitd70609a.el8

Reboot about 5 times, and all nodes did not got this error, so moved to verified.

Comment 9 errata-xmlrpc 2019-06-04 10:48:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758