Bug 2001442 - Empty termination.log file for the kube-apiserver has too permissive mode [NEEDINFO]
Summary: Empty termination.log file for the kube-apiserver has too permissive mode
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.9
Hardware: All
OS: Unspecified
low
medium
Target Milestone: ---
: 4.10.0
Assignee: Lukasz Szaszkiewicz
QA Contact: Rahul Gangwar
URL:
Whiteboard: LifecycleReset
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-06 06:43 UTC by Juan Antonio Osorio
Modified: 2022-03-10 16:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:07:08 UTC
Target Upstream Version:
Embargoed:
mfojtik: needinfo?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 1096 0 None open Bug 2001442: empty termination.log file for the kube-apiserver has too permissive mode 2021-12-21 12:33:12 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:07:24 UTC

Description Juan Antonio Osorio 2021-09-06 06:43:27 UTC
Description of problem:

In some cases the termination.log file for the kube-apiserver has a very permissive mode:

$ oc debug -q node/ip-10-0-168-116.ec2.internal -- ls -l /host/var/log/kube-apiserver  
total 106436
-rw-------. 1 root root 104856280 Sep  6 06:25 audit-2021-09-06T06-25-44.636.log
-rw-------. 1 root root   2133241 Sep  6 06:26 audit.log
-rw-r--r--. 1 root root         4 Sep  6 05:54 termination.log

This doesn't seem to be the case for all nodes though...

$ oc debug -q node/ip-10-0-157-35.ec2.internal -- ls -l /host/var/log/kube-apiserver
total 109980
-rw-------. 1 root root 104856320 Sep  6 06:27 audit-2021-09-06T06-27-29.680.log
-rw-------. 1 root root   2418656 Sep  6 06:28 audit.log
-rw-------. 1 root root   3629299 Sep  6 05:53 termination.log

It appears that this happens if the termination.log is empty:

oc debug -q node/ip-10-0-168-116.ec2.internal -- cat /host/var/log/kube-apiserver/termination.log
---

While this is not a critical issue (no critical data is being leaked). This confuses folks doing compliance checks either manually using our security guide, or automatically using the Compliance Operator as they get a failure on the node saying that the permissions are wrong.

Version-Release number of selected component (if applicable):

Client Version: 4.9.0-0.nightly-2021-08-31-123131
Server Version: 4.9.0-0.nightly-2021-08-31-123131
Kubernetes Version: v1.22.0-rc.0+1199c36

How reproducible:

...Often. I have no numbers on this... But it's been a recurrent failure in our compliance CI.


Steps to Reproduce:
1. Deploy a 4.9 cluster
2. Run the following command on each master/control-plane node: oc debug -q node/$NODE -- ls -l /host/var/log/kube-apiserver

Actual results:

The permissions for the termination.log file on one of the nodes will be too permissive:

-rw-------. 1 root root 104856280 Sep  6 06:25 audit-2021-09-06T06-25-44.636.log
-rw-------. 1 root root   2133241 Sep  6 06:26 audit.log
-rw-r--r--. 1 root root         4 Sep  6 05:54 termination.log


Expected results:

All of the files in the audit log have secure and restrictive permissions.

-rw-------. 1 root root 104856320 Sep  6 06:27 audit-2021-09-06T06-27-29.680.log
-rw-------. 1 root root   2418656 Sep  6 06:28 audit.log
-rw-------. 1 root root   3629299 Sep  6 05:53 termination.log

Additional info:

While the termination log has been available at least 4.8, we only started seeing constant CI failures in 4.9.

Comment 1 Michal Fojtik 2021-10-22 13:01:58 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 2 Juan Antonio Osorio 2021-10-25 05:00:11 UTC
This is still an issue.

Comment 3 Lukasz Szaszkiewicz 2021-11-05 08:44:06 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 4 Michal Fojtik 2021-11-24 05:09:29 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 6 Michal Fojtik 2022-01-14 06:11:46 UTC
The LifecycleStale keyword was removed because the bug moved to QE.
The bug assignee was notified.

Comment 8 Rahul Gangwar 2022-01-17 12:31:07 UTC
oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-17-023213   True        False         106m    Cluster version is 4.10.0-0.nightly-2022-01-17-023213


for i in `oc get node|grep -i master|cut -d " " -f1`; do echo $i;oc debug -q node/$i -- ls -l /host/var/log/kube-apiserver;done;
ip-10-0-155-212.us-east-2.compute.internal
total 32708
-rw-------. 1 root root 25241426 Jan 17 12:27 audit.log
-rw-------. 1 root root        4 Jan 17 10:38 termination.log
ip-10-0-178-70.us-east-2.compute.internal
total 579140
-rw-------. 1 root root 104857178 Jan 17 10:37 audit-2022-01-17T10-37-32.831.log
-rw-------. 1 root root 104856335 Jan 17 11:01 audit-2022-01-17T11-01-09.105.log
-rw-------. 1 root root 104857407 Jan 17 11:26 audit-2022-01-17T11-26-55.326.log
-rw-------. 1 root root 104857318 Jan 17 11:52 audit-2022-01-17T11-52-27.558.log
-rw-------. 1 root root 104856579 Jan 17 12:17 audit-2022-01-17T12-17-59.899.log
-rw-------. 1 root root  40125051 Jan 17 12:27 audit.log
-rw-------. 1 root root   1707092 Jan 17 10:37 termination.log
ip-10-0-203-61.us-east-2.compute.internal
total 60228
-rw-------. 1 root root 45709411 Jan 17 12:27 audit.log
-rw-------. 1 root root        4 Jan 17 10:39 termination.log

Comment 11 errata-xmlrpc 2022-03-10 16:07:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.