Bug 1969998 - [OCP 4.9 tracker] kubelet service fail to load EnvironmentFile due to SELinux denial
Summary: [OCP 4.9 tracker] kubelet service fail to load EnvironmentFile due to SELinux...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: ---
: 4.9.0
Assignee: Micah Abbott
QA Contact: HuijingHei
jfrye
URL:
Whiteboard:
Depends On: 1960769 1973418 2005018
Blocks: mint
TreeView+ depends on / blocked
 
Reported: 2021-06-09 15:32 UTC by Micah Abbott
Modified: 2021-10-18 17:33 UTC (History)
26 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The SELinux policy was disallowing systemd to read files under /etc/kubernetes Consequence: The kubelet will fail to start. Fix: Update the SELinux policy to allow systemd to read files with kubernetes_file_t labels Result: The kubelet starts successfully.
Clone Of: 1957840
Environment:
Last Closed: 2021-10-18 17:33:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:33:39 UTC

Comment 1 Micah Abbott 2021-06-09 15:34:38 UTC
This is a tracker for the inclusion of a fixed `container-selinux` RPM as described in https://bugzilla.redhat.com/show_bug.cgi?id=1960769

Comment 2 Micah Abbott 2021-06-10 13:01:00 UTC
Containers QE signed off on the build in 1960769 (container-selinux-2.162.0-1.module+el8.4.0+11311+9da8acfb.noarch)

Requesting ART to tag it into the RHAOS 4.8 Brew tag:

https://issues.redhat.com/browse/ART-3026

...but they don't have perms, so https://projects.engineering.redhat.com/browse/CLOUDBLD-6031

Comment 3 Micah Abbott 2021-06-11 14:50:12 UTC
We got mixed up somewhere and the tagging wasn't required.

The necessary `container-selinux` build was shipped as part of https://access.redhat.com/errata/RHSA-2021:2371 and RHCOS consumed it as it normally does.

RHCOS 48.84.202106100957-0 was the first to have it.

Comment 6 Michael Nguyen 2021-06-17 18:28:09 UTC
I am still getting the denial on 4.8.0-0.nightly-2021-06-16-190035 with the container-selinux-2.162.0-1.module+el8.4.0+11311+9da8acfb.noarch.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-16-190035   True        False         3h46m   Cluster version is 4.8.0-0.nightly-2021-06-16-190035

$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-134-236.us-west-2.compute.internal   Ready    master   4h12m   v1.21.0-rc.0+120883f
ip-10-0-150-206.us-west-2.compute.internal   Ready    worker   4h7m    v1.21.0-rc.0+120883f
ip-10-0-164-27.us-west-2.compute.internal    Ready    master   4h12m   v1.21.0-rc.0+120883f
ip-10-0-183-87.us-west-2.compute.internal    Ready    worker   4h5m    v1.21.0-rc.0+120883f
ip-10-0-210-154.us-west-2.compute.internal   Ready    master   4h13m   v1.21.0-rc.0+120883f
ip-10-0-222-34.us-west-2.compute.internal    Ready    worker   4h6m    v1.21.0-rc.0+120883f

$ oc debug node/ip-10-0-183-87.us-west-2.compute.internal 
Starting pod/ip-10-0-183-87us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cd /etc/kubernetes/
sh-4.4# ls
ca.crt	    cni		kubelet-ca.crt	 kubelet.conf  static-pod-resources
cloud.conf  kubeconfig	kubelet-plugins  manifests
sh-4.4# echo TEST=TEST > test-env
sh-4.4# ls -laZ
total 40
drwxr-xr-x.  6 root root system_u:object_r:kubernetes_file_t:s0  193 Jun 17 17:59 .
drwxr-xr-x. 96 root root system_u:object_r:etc_t:s0             8192 Jun 17 17:25 ..
-rw-r--r--.  1 root root system_u:object_r:kubernetes_file_t:s0 1123 Jun 17 17:25 ca.crt
-rw-r--r--.  1 root root system_u:object_r:kubernetes_file_t:s0    0 Jun 17 17:25 cloud.conf
drwxr-xr-x.  3 root root system_u:object_r:kubernetes_file_t:s0   19 Jun 17 13:48 cni
-rw-r--r--.  1 root root system_u:object_r:kubernetes_file_t:s0 6050 Jun 17 13:46 kubeconfig
-rw-r--r--.  1 root root system_u:object_r:kubernetes_file_t:s0 5875 Jun 17 17:25 kubelet-ca.crt
drwxr-xr-x.  3 root root system_u:object_r:kubernetes_file_t:s0   20 Jun 17 13:48 kubelet-plugins
-rw-r--r--.  1 root root system_u:object_r:kubernetes_file_t:s0 1076 Jun 17 17:25 kubelet.conf
drwxr-xr-x.  2 root root system_u:object_r:kubernetes_file_t:s0    6 Jun 17 13:49 manifests
drwxr-xr-x.  3 root root system_u:object_r:kubernetes_file_t:s0   24 Jun 17 13:48 static-pod-resources
-rw-r--r--.  1 root root system_u:object_r:kubernetes_file_t:s0   10 Jun 17 17:59 test-env
sh-4.4# vi /etc/systemd/system/kubelet.service
sh-4.4# audit2allow -a
sh-4.4# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
After=ostree-finalize-staged.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env
EnvironmentFile=/etc/kubernetes/test-env
EnvironmentFile=/etc/node-sizing.env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --runtime-cgroups=/system.slice/crio.service \
      --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \
      --node-ip=${KUBELET_NODE_IP} \
      --minimum-container-ttl-duration=6m0s \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
      --cloud-provider=aws \
       \
      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fa0f2cad0e8d907a10bf91b2fe234659495a694235a9e2ef7015eb450ce9f1ba \
      --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY} \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
sh-4.4# systemctl daemon-reload
sh-4.4# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-mco-default-madv.conf, 20-logging.conf
   Active: active (running) since Thu 2021-06-17 17:25:38 UTC; 35min ago
 Main PID: 1400 (kubelet)
    Tasks: 16 (limit: 48468)
   Memory: 201.0M
      CPU: 2min 27.198s
   CGroup: /system.slice/kubelet.service
           └─1400 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/ku>

Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.109653    1400 scope.go:111] "RemoveContai>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: E0617 18:01:04.109979    1400 remote_runtime.go:334] "Con>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.110010    1400 pod_container_deletor.go:52>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.198000    1400 reconciler.go:196] "operati>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.203632    1400 operation_generator.go:829]>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.298840    1400 reconciler.go:319] "Volume >
Jun 17 18:01:05 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:05.110755    1400 kubelet.go:1960] "SyncLoop >
Jun 17 18:01:05 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:05.116113    1400 kubelet.go:1954] "SyncLoop >
Jun 17 18:01:05 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:05.116180    1400 kubelet.go:2153] "Failed to>
Jun 17 18:01:05 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:05.116181    1400 reflector.go:225] Stopping >
sh-4.4# systemctl restart kubelet

Removing debug pod ...

== I get booted from the debug pod here because kubelet is gone.  SSH back in through bastion ==

$ ./ssh.sh ip-10-0-183-87.us-west-2.compute.internal
Warning: Permanently added 'ip-10-0-183-87.us-west-2.compute.internal' (ECDSA) to the list of known hosts.

[root@ip-10-0-183-87 kubernetes]# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-mco-default-madv.conf, 20-logging.conf
   Active: inactive (dead) (Result: resources) since Thu 2021-06-17 18:01:34 UTC; 3ms ago
  Process: 1400 ExecStart=/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap>
 Main PID: 1400 (code=exited, status=0/SUCCESS)
      CPU: 0

Jun 17 18:01:24 ip-10-0-183-87 systemd[1]: kubelet.service: Failed to load environment files: Permissi>
Jun 17 18:01:24 ip-10-0-183-87 systemd[1]: kubelet.service: Failed to run 'start-pre' task: Permission>
Jun 17 18:01:24 ip-10-0-183-87 systemd[1]: kubelet.service: Failed with result 'resources'.
Jun 17 18:01:24 ip-10-0-183-87 systemd[1]: Failed to start Kubernetes Kubelet.
Jun 17 18:01:34 ip-10-0-183-87 systemd[1]: kubelet.service: Service RestartSec=10s expired, scheduling>
Jun 17 18:01:34 ip-10-0-183-87 systemd[1]: kubelet.service: Scheduled restart job, restart counter is >
Jun 17 18:01:34 ip-10-0-183-87 systemd[1]: Stopped Kubernetes Kubelet.
Jun 17 18:01:34 ip-10-0-183-87 systemd[1]: kubelet.service: Consumed 0 CPU time
lines 1-17/17 (END)

[root@ip-10-0-183-87 kubernetes]# audit2allow -a


#============= init_t ==============
allow init_t kubernetes_file_t:file read;
[root@ip-10-0-183-87 kubernetes]# grep avc /var/log/audit/audit.log | tail -1 - | audit2why
type=AVC msg=audit(1623953918.958:1790): avc:  denied  { read } for  pid=1 comm="systemd" name="test-env" dev="nvme0n1p4" ino=92295647 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:kubernetes_file_t:s0 tclass=file permissive=0

	Was caused by:
		Missing type enforcement (TE) allow rule.

		You can use audit2allow to generate a loadable module to allow this access.

[root@ip-10-0-183-87 kubernetes]# rpm -q container-selinux
container-selinux-2.162.0-1.module+el8.4.0+11311+9da8acfb.noarch
[root@ip-10-0-183-87 kubernetes]# rpm-ostree status
State: idle
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a9af00365ac9b38d479a6e33bb54fbc3f1150d7903873b9a6f1230a2a7577622
              CustomOrigin: Managed by machine-config-operator
                   Version: 48.84.202106141119-0 (2021-06-14T11:22:36Z)

  ostree://457db8ff03dda5b3ce1a8e242fd91ddbe6a82f838d1b0047c3d4aeaf6c53f572
                   Version: 48.84.202106091622-0 (2021-06-09T16:25:42Z)

[root@ip-10-0-183-87 ~]# rpm -qi container-selinux
Name        : container-selinux
Epoch       : 2
Version     : 2.162.0
Release     : 1.module+el8.4.0+11311+9da8acfb
Architecture: noarch
Install Date: Mon Jun 14 11:21:04 2021
Group       : Unspecified
Size        : 47979
License     : GPLv2
Signature   : RSA/SHA256, Wed Jun  9 06:44:40 2021, Key ID 199e2f91fd431d51
Source RPM  : container-selinux-2.162.0-1.module+el8.4.0+11311+9da8acfb.src.rpm
Build Date  : Tue Jun  8 07:51:55 2021
Build Host  : s390-064.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://github.com/containers/container-selinux
Summary     : SELinux policies for container runtimes
Description :
SELinux policy modules for use with container runtimes.

Comment 7 Micah Abbott 2021-06-28 15:49:09 UTC
The fix for this failed and we are tracking the new issue - https://bugzilla.redhat.com/show_bug.cgi?id=1973418

This will have to target 4.9 and get cloned for a 4.8.z backport

Comment 9 Micah Abbott 2021-09-07 14:39:32 UTC
container-selinux-2.167.0-1.module+el8.5.0+12397+bf23b712  landed in RHCOS 49.84.202109031732-0

Marking as MODIFIED

Comment 13 HuijingHei 2021-09-28 03:45:17 UTC
Verify passed with 4.9.0-0.nightly-2021-09-25-094414 and container-selinux-2.167.0-1.module+el8.5.0+12397+bf23b712.noarch


$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-09-25-094414   True        False         4m10s   Cluster version is 4.9.0-0.nightly-2021-09-25-094414

$ oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ci-ln-shs9k12-f76d1-cfstg-master-0         Ready    master   46m   v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-master-1         Ready    master   46m   v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-master-2         Ready    master   46m   v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-worker-a-t5dpb   Ready    worker   38m   v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-worker-b-66x5q   Ready    worker   38m   v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-worker-c-qc4h6   Ready    worker   38m   v1.22.0-rc.0+af080cb

$ oc debug node/ci-ln-shs9k12-f76d1-cfstg-worker-c-qc4h6
sh-4.4# chroot /host
sh-4.4# rpm -q container-selinux
container-selinux-2.167.0-1.module+el8.5.0+12397+bf23b712.noarch

sh-4.4# echo TEST=foobar > /etc/kubernetes/test
sh-4.4# cat /etc/systemd/system/echo.service
[Unit]
Description=An echo unit
[Service]
Type=oneshot
RemainAfterExit=yes
EnvironmentFile=/etc/kubernetes/test
ExecStart=/usr/bin/echo ${PAUSE}
[Install]
WantedBy=multi-user.target

sh-4.4# systemctl daemon-reload && systemctl start echo.service
sh-4.4# systemctl status echo.service
● echo.service - An echo unit
   Loaded: loaded (/etc/systemd/system/echo.service; disabled; vendor preset: disabled)
   Active: active (exited) since Tue 2021-09-28 03:36:10 UTC; 5s ago
  Process: 46465 ExecStart=/usr/bin/echo ${PAUSE} (code=exited, status=0/SUCCESS)
 Main PID: 46465 (code=exited, status=0/SUCCESS)
      CPU: 5ms

Sep 28 03:36:10 ci-ln-shs9k12-f76d1-cfstg-worker-c-qc4h6 systemd[1]: Starting An echo unit...
Sep 28 03:36:10 ci-ln-shs9k12-f76d1-cfstg-worker-c-qc4h6 systemd[1]: Started An echo unit.

    
sh-4.4# cat /etc/os-release 
NAME="Red Hat Enterprise Linux CoreOS"
VERSION="49.84.202109241334-0"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION_ID="4.9"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 49.84.202109241334-0 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.9/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.9"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.9"
OPENSHIFT_VERSION="4.9"
RHEL_VERSION="8.4"
OSTREE_VERSION='49.84.202109241334-0'

Comment 15 errata-xmlrpc 2021-10-18 17:33:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.