1730042 – [RFE] DiskPressure monitoring for /var/log when using separate partition

Bug 1730042 - [RFE] DiskPressure monitoring for /var/log when using separate partition

Summary: [RFE] DiskPressure monitoring for /var/log when using separate partition

Keywords:
Status:	CLOSED DUPLICATE of bug 1574866
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Seth Jennings
QA Contact:	Jianwei Hou
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-15 16:53 UTC by Hugo Cisneiros (Eitch)
Modified:	2019-07-23 16:31 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-23 16:31:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1574866	0	unspecified	CLOSED	[RFE] Monitor of /var/log with external mount for DiskPressure on the node	2022-03-13 14:57:33 UTC

Description Hugo Cisneiros (Eitch) 2019-07-15 16:53:26 UTC

Description of problem:
Customer has a mounted /var/log as a separate partition, and when it fills up, any schedule container will be stuck in ContainerCreating status due to not having space on disk.

* Even with /var/log space taken, node still have "Ready" status, so new pods will continually get stuck instead of scheduling to another pod.
* Even if the pod outputs nothing and does not need any log space, it still gets stuck.

Version-Release number of selected component (if applicable):
3.11.x

How reproducible:
Always.

Steps to Reproduce:
1. Mount /var/log as a separate partition in any OpenShift cluster node;
2. Fill up all partition's space;
3. Try to create a pod in the cluster: it gets stuck in ContainerCreating status instead of scheduling to another node. Error is: "Failed create pod sandbox: mkdir /var/log/pods/<uid>: no space left on device"

Actual results:

* Cluster won't create pods on the node as they are stuck with ContainerCreating. This requires manual intervention of cluster administration.
* Node still is in "Ready" state.

Expected results:

* Pods must be created since they don't really rely on /var/log to work;
* if kubelet requires /var/log space for creating pods, it should monitor it.

Comment 2 Hugo Cisneiros (Eitch) 2019-07-15 17:22:35 UTC

Example of contents from /var/log/pods:

# ls -lha /var/log/pods/bbf67ae8-a722-11e9-ae46-525400575a3e/app-cli/
total 0
drwxr-xr-x. 2 root root  19 Jul 15 13:05 .
drwxr-xr-x. 3 root root  21 Jul 15 13:05 ..
lrwxrwxrwx. 1 root root 165 Jul 15 13:05 0.log -> /var/lib/docker/containers/4c6e1da83b270d21197ff04be743b02de71327603657113d0dfd0781931ad50d/4c6e1da83b270d21197ff04be743b02de71327603657113d0dfd0781931ad50d-json.log

Comment 3 Greg Blomquist 2019-07-23 14:08:30 UTC

It appears that a previous RFE for this same request was closed by OCP Product Management.  I think this one will go the same way unless there's a clear demonstrable impact to a large number of customers.

Also, looking at the case, it looks like a possible workaround was suggested to the customer for now.

I think we should evaluate whether this is an appropriate feature for 4.x (i.e., is 4.x still impacted by this?).  And, if so, move this to Jira for tracking in the Node team's feature backlog.

Seth, thoughts on this in 4.x?

Comment 4 Seth Jennings 2019-07-23 16:31:19 UTC

Closing as dup of closed RFE.  It isn't that we wouldn't like to do something about this, but we can't implement this in 3.x z-streams and for 4.x, it isn't an issue because custom partition layouts are not supported in RHCOS.

*** This bug has been marked as a duplicate of bug 1574866 ***

Note You need to log in before you can comment on or make changes to this bug.