2052702 – multicluster_operators_hub_subscription issues due to /tmp usage

Bug 2052702 - multicluster_operators_hub_subscription issues due to /tmp usage

Summary: multicluster_operators_hub_subscription issues due to /tmp usage

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Advanced Cluster Management for Kubernetes
Classification:	Red Hat
Component:	App Lifecycle
Sub Component:
Version:	rhacm-2.4.z
Hardware:	x86_64
OS:	Other
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	rhacm-2.4.5
Assignee:	Mike Ng
QA Contact:
Docs Contact:	bswope@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-09 19:57 UTC by Yerzhan Beisembayev
Modified:	2022-06-27 17:04 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-06-27 17:04:01 UTC
Target Upstream Version:
Embargoed:
Flags:	juhsu: rhacm-2.4.z+ juhsu: rhacm-2.5+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	stolostron backlog issues 19823	0	None	None	None	2022-02-09 23:33:14 UTC
Red Hat Product Errata	RHSA-2022:5201	0	None	None	None	2022-06-27 17:04:31 UTC

Description Yerzhan Beisembayev 2022-02-09 19:57:15 UTC

Description of the problem:

Multicluster_operators_hub_subscription pod having various issues due to the constantly increasing /tmp storage usage.

This includes:
- Error: Kubelet may be retrying requests that are timing out in CRI-O due to system load: context deadline exceeded: error reserving ctr name k8s_multicluster-operators-hub-subscription_multicluster-operators-hub-subscription-..._open-cluster-management_... for id 3844...c6b9: name is reserved

- Pod constantly being evicted with error:
The node was low on resource: ephemeral-storage. Container multicluster-operators-hub-subscription was using 16392Ki, which exceeds its request of 0.

- Pod logging errors similar to:
E0209 13:08:19.899823 1 gitrepo_sync.go:388] lstat /tmp/data-services/release/internal: no such file or directoryFailed to sort kubernetes resources and helm charts.
E0209 13:08:19.899861 1 gitrepo_sync.go:112] lstat /tmp/data-services/release/internal: no such file or directory
E0209 13:08:19.899888 1 mcmhub_controller.go:626] subscription-hub-reconciler "msg"="failed to process on doMCMHubReconcile" "error"="lstat /tmp/data-services/release/internal: no such file or directory"

Release version:
ACM v2.4.1 (OCP 4.8 (ARO))

Operator snapshot version:

OCP version:
4.8.29 (ARO)

Browser Info:

Steps to reproduce:
1. Configure applications to be deployed via HelmRepo channel
2. Access multicluster_operators_hub_subscription via rsh
3. Check the /tmp folder - observe that subfolders named charts* will be constantly generated

Actual results:
New subfolders charts* will be constantly created in /tmp folder. Even in the moderate environments (6 managed clusters, ~30 apps) /tmp will run out of free space in just few days (~70G, 250k charts* folders in about 4 days)

Expected results:
/tmp usage does not grow

Additional info:
@Roke Jung proposed solution to include CHARTS_DIR environment variable ("/tmp") value which seems to help with the problem.
I think this should be included OOB.

Comment 8 errata-xmlrpc 2022-06-27 17:04:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.4.5 security updates and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5201

Note You need to log in before you can comment on or make changes to this bug.