Bug 1969998
| Summary: | [OCP 4.9 tracker] kubelet service fail to load EnvironmentFile due to SELinux denial | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Micah Abbott <miabbott> |
| Component: | RHCOS | Assignee: | Micah Abbott <miabbott> |
| Status: | CLOSED ERRATA | QA Contact: | HuijingHei <hhei> |
| Severity: | medium | Docs Contact: | jfrye |
| Priority: | urgent | ||
| Version: | 4.8 | CC: | aos-bugs, dornelas, dwalsh, ercohen, itsoiref, jfrye, jligon, jnovy, jparrill, keyoung, lvrabec, mavazque, miabbott, mmalik, mrussell, nstielau, pablo.iranzo, plautrba, rfreiman, shardy, ssekidde, tsweeney, walters, weshen, ypu, zpytela |
| Target Milestone: | --- | ||
| Target Release: | 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: The SELinux policy was disallowing systemd to read files under /etc/kubernetes
Consequence: The kubelet will fail to start.
Fix: Update the SELinux policy to allow systemd to read files with kubernetes_file_t labels
Result: The kubelet starts successfully.
|
Story Points: | --- |
| Clone Of: | 1957840 | Environment: | |
| Last Closed: | 2021-10-18 17:33:22 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1960769, 1973418, 2005018 | ||
| Bug Blocks: | 1958966 | ||
|
Comment 1
Micah Abbott
2021-06-09 15:34:38 UTC
Containers QE signed off on the build in 1960769 (container-selinux-2.162.0-1.module+el8.4.0+11311+9da8acfb.noarch) Requesting ART to tag it into the RHAOS 4.8 Brew tag: https://issues.redhat.com/browse/ART-3026 ...but they don't have perms, so https://projects.engineering.redhat.com/browse/CLOUDBLD-6031 We got mixed up somewhere and the tagging wasn't required. The necessary `container-selinux` build was shipped as part of https://access.redhat.com/errata/RHSA-2021:2371 and RHCOS consumed it as it normally does. RHCOS 48.84.202106100957-0 was the first to have it. I am still getting the denial on 4.8.0-0.nightly-2021-06-16-190035 with the container-selinux-2.162.0-1.module+el8.4.0+11311+9da8acfb.noarch.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.0-0.nightly-2021-06-16-190035 True False 3h46m Cluster version is 4.8.0-0.nightly-2021-06-16-190035
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-134-236.us-west-2.compute.internal Ready master 4h12m v1.21.0-rc.0+120883f
ip-10-0-150-206.us-west-2.compute.internal Ready worker 4h7m v1.21.0-rc.0+120883f
ip-10-0-164-27.us-west-2.compute.internal Ready master 4h12m v1.21.0-rc.0+120883f
ip-10-0-183-87.us-west-2.compute.internal Ready worker 4h5m v1.21.0-rc.0+120883f
ip-10-0-210-154.us-west-2.compute.internal Ready master 4h13m v1.21.0-rc.0+120883f
ip-10-0-222-34.us-west-2.compute.internal Ready worker 4h6m v1.21.0-rc.0+120883f
$ oc debug node/ip-10-0-183-87.us-west-2.compute.internal
Starting pod/ip-10-0-183-87us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cd /etc/kubernetes/
sh-4.4# ls
ca.crt cni kubelet-ca.crt kubelet.conf static-pod-resources
cloud.conf kubeconfig kubelet-plugins manifests
sh-4.4# echo TEST=TEST > test-env
sh-4.4# ls -laZ
total 40
drwxr-xr-x. 6 root root system_u:object_r:kubernetes_file_t:s0 193 Jun 17 17:59 .
drwxr-xr-x. 96 root root system_u:object_r:etc_t:s0 8192 Jun 17 17:25 ..
-rw-r--r--. 1 root root system_u:object_r:kubernetes_file_t:s0 1123 Jun 17 17:25 ca.crt
-rw-r--r--. 1 root root system_u:object_r:kubernetes_file_t:s0 0 Jun 17 17:25 cloud.conf
drwxr-xr-x. 3 root root system_u:object_r:kubernetes_file_t:s0 19 Jun 17 13:48 cni
-rw-r--r--. 1 root root system_u:object_r:kubernetes_file_t:s0 6050 Jun 17 13:46 kubeconfig
-rw-r--r--. 1 root root system_u:object_r:kubernetes_file_t:s0 5875 Jun 17 17:25 kubelet-ca.crt
drwxr-xr-x. 3 root root system_u:object_r:kubernetes_file_t:s0 20 Jun 17 13:48 kubelet-plugins
-rw-r--r--. 1 root root system_u:object_r:kubernetes_file_t:s0 1076 Jun 17 17:25 kubelet.conf
drwxr-xr-x. 2 root root system_u:object_r:kubernetes_file_t:s0 6 Jun 17 13:49 manifests
drwxr-xr-x. 3 root root system_u:object_r:kubernetes_file_t:s0 24 Jun 17 13:48 static-pod-resources
-rw-r--r--. 1 root root system_u:object_r:kubernetes_file_t:s0 10 Jun 17 17:59 test-env
sh-4.4# vi /etc/systemd/system/kubelet.service
sh-4.4# audit2allow -a
sh-4.4# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
After=ostree-finalize-staged.service
[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env
EnvironmentFile=/etc/kubernetes/test-env
EnvironmentFile=/etc/node-sizing.env
ExecStart=/usr/bin/hyperkube \
kubelet \
--config=/etc/kubernetes/kubelet.conf \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--container-runtime=remote \
--container-runtime-endpoint=/var/run/crio/crio.sock \
--runtime-cgroups=/system.slice/crio.service \
--node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \
--node-ip=${KUBELET_NODE_IP} \
--minimum-container-ttl-duration=6m0s \
--volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
--cloud-provider=aws \
\
--pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fa0f2cad0e8d907a10bf91b2fe234659495a694235a9e2ef7015eb450ce9f1ba \
--system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY} \
--v=${KUBELET_LOG_LEVEL}
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
sh-4.4# systemctl daemon-reload
sh-4.4# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-mco-default-madv.conf, 20-logging.conf
Active: active (running) since Thu 2021-06-17 17:25:38 UTC; 35min ago
Main PID: 1400 (kubelet)
Tasks: 16 (limit: 48468)
Memory: 201.0M
CPU: 2min 27.198s
CGroup: /system.slice/kubelet.service
└─1400 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/ku>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.109653 1400 scope.go:111] "RemoveContai>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: E0617 18:01:04.109979 1400 remote_runtime.go:334] "Con>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.110010 1400 pod_container_deletor.go:52>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.198000 1400 reconciler.go:196] "operati>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.203632 1400 operation_generator.go:829]>
Jun 17 18:01:04 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:04.298840 1400 reconciler.go:319] "Volume >
Jun 17 18:01:05 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:05.110755 1400 kubelet.go:1960] "SyncLoop >
Jun 17 18:01:05 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:05.116113 1400 kubelet.go:1954] "SyncLoop >
Jun 17 18:01:05 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:05.116180 1400 kubelet.go:2153] "Failed to>
Jun 17 18:01:05 ip-10-0-183-87 hyperkube[1400]: I0617 18:01:05.116181 1400 reflector.go:225] Stopping >
sh-4.4# systemctl restart kubelet
Removing debug pod ...
== I get booted from the debug pod here because kubelet is gone. SSH back in through bastion ==
$ ./ssh.sh ip-10-0-183-87.us-west-2.compute.internal
Warning: Permanently added 'ip-10-0-183-87.us-west-2.compute.internal' (ECDSA) to the list of known hosts.
[root@ip-10-0-183-87 kubernetes]# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-mco-default-madv.conf, 20-logging.conf
Active: inactive (dead) (Result: resources) since Thu 2021-06-17 18:01:34 UTC; 3ms ago
Process: 1400 ExecStart=/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap>
Main PID: 1400 (code=exited, status=0/SUCCESS)
CPU: 0
Jun 17 18:01:24 ip-10-0-183-87 systemd[1]: kubelet.service: Failed to load environment files: Permissi>
Jun 17 18:01:24 ip-10-0-183-87 systemd[1]: kubelet.service: Failed to run 'start-pre' task: Permission>
Jun 17 18:01:24 ip-10-0-183-87 systemd[1]: kubelet.service: Failed with result 'resources'.
Jun 17 18:01:24 ip-10-0-183-87 systemd[1]: Failed to start Kubernetes Kubelet.
Jun 17 18:01:34 ip-10-0-183-87 systemd[1]: kubelet.service: Service RestartSec=10s expired, scheduling>
Jun 17 18:01:34 ip-10-0-183-87 systemd[1]: kubelet.service: Scheduled restart job, restart counter is >
Jun 17 18:01:34 ip-10-0-183-87 systemd[1]: Stopped Kubernetes Kubelet.
Jun 17 18:01:34 ip-10-0-183-87 systemd[1]: kubelet.service: Consumed 0 CPU time
lines 1-17/17 (END)
[root@ip-10-0-183-87 kubernetes]# audit2allow -a
#============= init_t ==============
allow init_t kubernetes_file_t:file read;
[root@ip-10-0-183-87 kubernetes]# grep avc /var/log/audit/audit.log | tail -1 - | audit2why
type=AVC msg=audit(1623953918.958:1790): avc: denied { read } for pid=1 comm="systemd" name="test-env" dev="nvme0n1p4" ino=92295647 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:kubernetes_file_t:s0 tclass=file permissive=0
Was caused by:
Missing type enforcement (TE) allow rule.
You can use audit2allow to generate a loadable module to allow this access.
[root@ip-10-0-183-87 kubernetes]# rpm -q container-selinux
container-selinux-2.162.0-1.module+el8.4.0+11311+9da8acfb.noarch
[root@ip-10-0-183-87 kubernetes]# rpm-ostree status
State: idle
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a9af00365ac9b38d479a6e33bb54fbc3f1150d7903873b9a6f1230a2a7577622
CustomOrigin: Managed by machine-config-operator
Version: 48.84.202106141119-0 (2021-06-14T11:22:36Z)
ostree://457db8ff03dda5b3ce1a8e242fd91ddbe6a82f838d1b0047c3d4aeaf6c53f572
Version: 48.84.202106091622-0 (2021-06-09T16:25:42Z)
[root@ip-10-0-183-87 ~]# rpm -qi container-selinux
Name : container-selinux
Epoch : 2
Version : 2.162.0
Release : 1.module+el8.4.0+11311+9da8acfb
Architecture: noarch
Install Date: Mon Jun 14 11:21:04 2021
Group : Unspecified
Size : 47979
License : GPLv2
Signature : RSA/SHA256, Wed Jun 9 06:44:40 2021, Key ID 199e2f91fd431d51
Source RPM : container-selinux-2.162.0-1.module+el8.4.0+11311+9da8acfb.src.rpm
Build Date : Tue Jun 8 07:51:55 2021
Build Host : s390-064.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor : Red Hat, Inc.
URL : https://github.com/containers/container-selinux
Summary : SELinux policies for container runtimes
Description :
SELinux policy modules for use with container runtimes.
The fix for this failed and we are tracking the new issue - https://bugzilla.redhat.com/show_bug.cgi?id=1973418 This will have to target 4.9 and get cloned for a 4.8.z backport container-selinux-2.167.0-1.module+el8.5.0+12397+bf23b712 landed in RHCOS 49.84.202109031732-0 Marking as MODIFIED Verify passed with 4.9.0-0.nightly-2021-09-25-094414 and container-selinux-2.167.0-1.module+el8.5.0+12397+bf23b712.noarch
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-09-25-094414 True False 4m10s Cluster version is 4.9.0-0.nightly-2021-09-25-094414
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-shs9k12-f76d1-cfstg-master-0 Ready master 46m v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-master-1 Ready master 46m v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-master-2 Ready master 46m v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-worker-a-t5dpb Ready worker 38m v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-worker-b-66x5q Ready worker 38m v1.22.0-rc.0+af080cb
ci-ln-shs9k12-f76d1-cfstg-worker-c-qc4h6 Ready worker 38m v1.22.0-rc.0+af080cb
$ oc debug node/ci-ln-shs9k12-f76d1-cfstg-worker-c-qc4h6
sh-4.4# chroot /host
sh-4.4# rpm -q container-selinux
container-selinux-2.167.0-1.module+el8.5.0+12397+bf23b712.noarch
sh-4.4# echo TEST=foobar > /etc/kubernetes/test
sh-4.4# cat /etc/systemd/system/echo.service
[Unit]
Description=An echo unit
[Service]
Type=oneshot
RemainAfterExit=yes
EnvironmentFile=/etc/kubernetes/test
ExecStart=/usr/bin/echo ${PAUSE}
[Install]
WantedBy=multi-user.target
sh-4.4# systemctl daemon-reload && systemctl start echo.service
sh-4.4# systemctl status echo.service
● echo.service - An echo unit
Loaded: loaded (/etc/systemd/system/echo.service; disabled; vendor preset: disabled)
Active: active (exited) since Tue 2021-09-28 03:36:10 UTC; 5s ago
Process: 46465 ExecStart=/usr/bin/echo ${PAUSE} (code=exited, status=0/SUCCESS)
Main PID: 46465 (code=exited, status=0/SUCCESS)
CPU: 5ms
Sep 28 03:36:10 ci-ln-shs9k12-f76d1-cfstg-worker-c-qc4h6 systemd[1]: Starting An echo unit...
Sep 28 03:36:10 ci-ln-shs9k12-f76d1-cfstg-worker-c-qc4h6 systemd[1]: Started An echo unit.
sh-4.4# cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
VERSION="49.84.202109241334-0"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION_ID="4.9"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 49.84.202109241334-0 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.9/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.9"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.9"
OPENSHIFT_VERSION="4.9"
RHEL_VERSION="8.4"
OSTREE_VERSION='49.84.202109241334-0'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |