Bug 2027926

Summary: sandbox creation fails due to obsolete option in /etc/containers/storage.conf
Product: OpenShift Container Platform Reporter: Fraser Tweedale <ftweedal>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-04 18:41:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 2027927, 2109599    
Bug Blocks: 2108699    

Description Fraser Tweedale 2021-12-01 04:19:07 UTC
Description of problem:

The `storage.conf(5)` `override_kernel_check` option was removed from the
containers/storage library in early 2019:


With recent version of CRI-O present in OCP >= 4.9, the presence of
this field causes sandbox creation failure when using user namespaces:

    Warning  FailedCreatePodSandBox  SSs (xN over MMm)  kubelet
    (combined from similar events): Failed to create pod sandbox: rpc
    error: code = Unknown desc = error creating pod sandbox with name
    "{NAME}": error creating an ID-mapped copy of layer "{HASH}":
    time="{TIMESTAMP}" level=warning msg="Failed to decode the keys
    [\"storage.options.override_kernel_check\"] from

Version-Release number of selected component (if applicable): OCP 4.9.any

How reproducible: always, when requesting user namespace

Steps to Reproduce:

See blog post: https://frasertweedale.github.io/blog-redhat/posts/2021-07-22-openshift-systemd-workload-demo.html

Actual results: sandbox creation fails (see error message above)

Expected results: sandbox creation succeeds

Additional info:

Removal of `override_kernel_check` option from /etc/containers/storage.conf resolves the issue.

MCO PR (already merged): https://github.com/openshift/machine-config-operator/pull/2845

Backport PR: https://github.com/openshift/machine-config-operator/pull/2848

It is possible that CRI-O itself could be be modified to handle this situation.
The offending code comes from containers/storage/store.go (this library is vendored in
the cri-o repo):

  mappedLayer, _, err := rlstore.Put("", parentLayer, nil, layer.MountLabel, nil, &layerOptions, false, nil, nil) 
  if err != nil {
          return nil, errors.Wrapf(err, "error creating an ID-mapped copy of layer %q", layer.ID)                               
In this case, mappedLayer is non-nil, and err is *also* non-nil and represents a warning
rather than an unrecoverable error.  The program could optimistically continue when mappedLayer
is non-nil.

Comment 6 errata-xmlrpc 2022-01-04 18:41:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.12 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.