Bug 2028590 - container-mount-namespace workaround breaks fully-baremetal multi-node deployments
Summary: container-mount-namespace workaround breaks fully-baremetal multi-node deploy...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: Jim Ramsay
QA Contact: Joshua Clark
URL:
Whiteboard: Telco; Telco:RAN
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-02 17:17 UTC by Jim Ramsay
Modified: 2022-07-11 15:28 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-11 15:28:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni cnf-features-deploy pull 849 0 None Merged ztp: Bug 2028590: Move our execstart last 2022-03-09 14:49:57 UTC
Github openshift-kni cnf-features-deploy pull 850 0 None Merged features: Move our mount namespace execstart last 2022-03-09 14:50:08 UTC
Github openshift machine-config-operator pull 2858 0 None Merged Bug 2028590: Remove unneeded crio.service.d drop-in 2022-03-09 14:50:13 UTC
Red Hat Product Errata RHBA-2022:5514 0 None None None 2022-07-11 15:28:44 UTC

Description Jim Ramsay 2021-12-02 17:17:37 UTC
Description of problem:

When deploying a multi-node cluster on all bare-metal nodes, MCP adds an additional crio drop-in that circumvents the container-mount-namespace drop-in, with the result that crio lives in the base namespace and kubelet is in the hidden namespace, and all containers fail to get any kubelet-mounted filesystems (such as secrets and tokens).

Version-Release number of selected component (if applicable):

Affects all OpenShift versions 4.5 and later in conjunction with the container-mount-namespace workaround originally released in cnf-features-deploy 4.8, but only if MCP's ControllerConfig platform is "baremetal" or "vsphere".

How reproducible:

100% on affected platforms

Steps to Reproduce:

1. Deploy OpenShift 4.5 or later, multi-node with baremetal or vsphere platform
2. Add the container-mount-namespace workaround from cnf-features-deploy 4.8

Actual results:

Crio is running in a different mount namespace from kubelet, therefore containers started by crio do not see secrets or tokens mounted by kubelet

Expected results:

Crio and kubelet must be in the same mount namespace so that containers started by crio should see secrets and tokens mounted by kubelet

Comment 1 Jim Ramsay 2021-12-03 16:21:09 UTC
It turns out that the conflicting drop-in owned by MCO for the baremetal and vsphere platforms is redundant and unneeded.  CRI-O already respects the environment variable that is being set, without any need for editing the command line in this drop-in.

I've opened a PR to remove the unneeded drop-ins here: https://github.com/openshift/machine-config-operator/pull/2858

In addition, I will open a second PR to cnf-features-deploy that will solve this from the other end, making our drop-in compatible with MCO even if it is still applying its drop-in.

Comment 6 Joshua Clark 2022-03-23 19:42:49 UTC
QE Verified fixed. MachineConfig looks good. CRI-O and kubelet have the same mount namespace:

[core@helix16 ~]$ cat /proc/62743/mountinfo |grep -i namespace
331 330 0:24 /container-mount-namespace /run/container-mount-namespace rw,nosuid,nodev shared:188 - tmpfs tmpfs rw,seclabel,mode=755
[core@helix16 ~]$ cat /proc/10745/mountinfo |grep -i namespace
331 330 0:24 /container-mount-namespace /run/container-mount-namespace rw,nosuid,nodev shared:188 - tmpfs tmpfs rw,seclabel,mode=755

Comment 9 errata-xmlrpc 2022-07-11 15:28:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.22 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5514


Note You need to log in before you can comment on or make changes to this bug.