This bug was initially created as a copy of Bug #2027926 I am copying this bug because: MCO PR process seems to require it... Description of problem: The `storage.conf(5)` `override_kernel_check` option was removed from the containers/storage library in early 2019: https://github.com/containers/storage/commit/bd6cac944a0f808561eb3ab41ff0db73fc2596cb With recent version of CRI-O present in OCP >= 4.9, the presence of this field causes sandbox creation failure when using user namespaces: Warning FailedCreatePodSandBox SSs (xN over MMm) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = error creating pod sandbox with name "{NAME}": error creating an ID-mapped copy of layer "{HASH}": time="{TIMESTAMP}" level=warning msg="Failed to decode the keys [\"storage.options.override_kernel_check\"] from \"/etc/containers/storage.conf\"." Version-Release number of selected component (if applicable): OCP 4.9.any How reproducible: always, when requesting user namespace Steps to Reproduce: See blog post: https://frasertweedale.github.io/blog-redhat/posts/2021-07-22-openshift-systemd-workload-demo.html Actual results: sandbox creation fails (see error message above) Expected results: sandbox creation succeeds Additional info: Removal of `override_kernel_check` option from /etc/containers/storage.conf resolves the issue. MCO PR (already merged): https://github.com/openshift/machine-config-operator/pull/2845 Backport PR: https://github.com/openshift/machine-config-operator/pull/2848 It is possible that CRI-O itself could be be modified to handle this situation. The offending code comes from containers/storage/store.go (this library is vendored in the cri-o repo): mappedLayer, _, err := rlstore.Put("", parentLayer, nil, layer.MountLabel, nil, &layerOptions, false, nil, nil) if err != nil { return nil, errors.Wrapf(err, "error creating an ID-mapped copy of layer %q", layer.ID) } In this case, mappedLayer is non-nil, and err is *also* non-nil and represents a warning rather than an unrecoverable error. The program could optimistically continue when mappedLayer is non-nil.
Verified on 4.10.0-0.nightly-2021-12-06-201335. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-12-06-201335 True False 3h52m Cluster version is 4.10.0-0.nightly-2021-12-06-201335 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-141-168.us-east-2.compute.internal Ready master 4h7m v1.22.1+6859754 10.0.141.168 <none> Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa) 4.18.0-305.28.1.el8_4.x86_64 cri-o://1.23.0-89.rhaos4.10.git367232b.el8 ip-10-0-159-111.us-east-2.compute.internal Ready worker 3h59m v1.22.1+6859754 10.0.159.111 <none> Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa) 4.18.0-305.28.1.el8_4.x86_64 cri-o://1.23.0-89.rhaos4.10.git367232b.el8 ip-10-0-179-130.us-east-2.compute.internal Ready worker 3h59m v1.22.1+6859754 10.0.179.130 <none> Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa) 4.18.0-305.28.1.el8_4.x86_64 cri-o://1.23.0-89.rhaos4.10.git367232b.el8 ip-10-0-191-19.us-east-2.compute.internal Ready master 4h6m v1.22.1+6859754 10.0.191.19 <none> Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa) 4.18.0-305.28.1.el8_4.x86_64 cri-o://1.23.0-89.rhaos4.10.git367232b.el8 ip-10-0-195-42.us-east-2.compute.internal Ready master 4h6m v1.22.1+6859754 10.0.195.42 <none> Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa) 4.18.0-305.28.1.el8_4.x86_64 cri-o://1.23.0-89.rhaos4.10.git367232b.el8 ip-10-0-218-47.us-east-2.compute.internal Ready worker 4h2m v1.22.1+6859754 10.0.218.47 <none> Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa) 4.18.0-305.28.1.el8_4.x86_64 cri-o://1.23.0-89.rhaos4.10.git367232b.el8 $ oc debug node/ip-10-0-159-111.us-east-2.compute.internal Starting pod/ip-10-0-159-111us-east-2computeinternal-debug ... ... sh-4.4# cat /etc/containers/storage.conf | grep -i "override_kernel_check" sh-4.4#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056