Bug 2027927 - sandbox creation fails due to obsolete option in /etc/containers/storage.conf
Summary: sandbox creation fails due to obsolete option in /etc/containers/storage.conf
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.10.0
Assignee: Peter Hunt
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks: 2027926 2109599
TreeView+ depends on / blocked
 
Reported: 2021-12-01 04:28 UTC by Fraser Tweedale
Modified: 2022-07-21 15:09 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:31:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:31:22 UTC

Description Fraser Tweedale 2021-12-01 04:28:43 UTC
This bug was initially created as a copy of Bug #2027926

I am copying this bug because: MCO PR process seems to require it...



Description of problem:

The `storage.conf(5)` `override_kernel_check` option was removed from the
containers/storage library in early 2019:

  https://github.com/containers/storage/commit/bd6cac944a0f808561eb3ab41ff0db73fc2596cb

With recent version of CRI-O present in OCP >= 4.9, the presence of
this field causes sandbox creation failure when using user namespaces:

    Warning  FailedCreatePodSandBox  SSs (xN over MMm)  kubelet
    (combined from similar events): Failed to create pod sandbox: rpc
    error: code = Unknown desc = error creating pod sandbox with name
    "{NAME}": error creating an ID-mapped copy of layer "{HASH}":
    time="{TIMESTAMP}" level=warning msg="Failed to decode the keys
    [\"storage.options.override_kernel_check\"] from
    \"/etc/containers/storage.conf\"."



Version-Release number of selected component (if applicable): OCP 4.9.any


How reproducible: always, when requesting user namespace


Steps to Reproduce:

See blog post: https://frasertweedale.github.io/blog-redhat/posts/2021-07-22-openshift-systemd-workload-demo.html


Actual results: sandbox creation fails (see error message above)

Expected results: sandbox creation succeeds


Additional info:

Removal of `override_kernel_check` option from /etc/containers/storage.conf resolves the issue.

MCO PR (already merged): https://github.com/openshift/machine-config-operator/pull/2845

Backport PR: https://github.com/openshift/machine-config-operator/pull/2848


It is possible that CRI-O itself could be be modified to handle this situation.
The offending code comes from containers/storage/store.go (this library is vendored in
the cri-o repo):

  mappedLayer, _, err := rlstore.Put("", parentLayer, nil, layer.MountLabel, nil, &layerOptions, false, nil, nil) 
  if err != nil {
          return nil, errors.Wrapf(err, "error creating an ID-mapped copy of layer %q", layer.ID)                               
  }
 
In this case, mappedLayer is non-nil, and err is *also* non-nil and represents a warning
rather than an unrecoverable error.  The program could optimistically continue when mappedLayer
is non-nil.

Comment 3 Sunil Choudhary 2021-12-09 10:47:04 UTC
Verified on 4.10.0-0.nightly-2021-12-06-201335.

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-12-06-201335   True        False         3h52m   Cluster version is 4.10.0-0.nightly-2021-12-06-201335

$ oc get nodes -o wide
NAME                                         STATUS   ROLES    AGE     VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-141-168.us-east-2.compute.internal   Ready    master   4h7m    v1.22.1+6859754   10.0.141.168   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-159-111.us-east-2.compute.internal   Ready    worker   3h59m   v1.22.1+6859754   10.0.159.111   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-179-130.us-east-2.compute.internal   Ready    worker   3h59m   v1.22.1+6859754   10.0.179.130   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-191-19.us-east-2.compute.internal    Ready    master   4h6m    v1.22.1+6859754   10.0.191.19    <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-195-42.us-east-2.compute.internal    Ready    master   4h6m    v1.22.1+6859754   10.0.195.42    <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-218-47.us-east-2.compute.internal    Ready    worker   4h2m    v1.22.1+6859754   10.0.218.47    <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8

$ oc debug node/ip-10-0-159-111.us-east-2.compute.internal
Starting pod/ip-10-0-159-111us-east-2computeinternal-debug ...
...

sh-4.4# cat /etc/containers/storage.conf | grep -i "override_kernel_check"
sh-4.4#

Comment 6 errata-xmlrpc 2022-03-10 16:31:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.