1884800 – Failed to set up mount unit: Invalid argument

Bug 1884800 - Failed to set up mount unit: Invalid argument

Summary: Failed to set up mount unit: Invalid argument

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Hemant Kumar
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1939416 1940898 (view as bug list)
Depends On:
Blocks:	1915520
TreeView+	depends on / blocked

Reported:	2020-10-02 19:57 UTC by dtarabor
Modified:	2024-12-20 19:17 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 22:33:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift kubernetes pull 461	None	closed	Bug 1884800: Reduce volume name length for vsphere	2021-02-03 07:21:05 UTC
Github	openshift kubernetes pull 701	None	open	Bug 1884800: Reduce names of vsphere volumes even further	2021-04-29 20:00:05 UTC
Red Hat Product Errata	RHSA-2021:2438	None	None	None	2021-07-27 22:34:13 UTC

Description dtarabor 2020-10-02 19:57:04 UTC

Description of problem:

* fluentd pods are causing "Failed to set up mount unit: Invalid argument" errors multiple times per second.  

if the logging operator (fluentd) is removed, the problem resolves itself.

[root@au1-ocpinf-d01 ~]# journalctl --since "1 days ago" | grep "Invalid argument"
Sep 25 19:33:12 au1-ocpinf-d01.ocp4-lab.sarc.samsung.com systemd[1]: Failed to set up mount unit: Invalid argument
Sep 25 19:33:12 au1-ocpinf-d01.ocp4-lab.sarc.samsung.com systemd[1]: Failed to set up mount unit: Invalid argument
Sep 25 19:33:13 au1-ocpinf-d01.ocp4-lab.sarc.samsung.com systemd[1]: Failed to set up mount unit: Invalid argument
Sep 25 19:33:13 au1-ocpinf-d01.ocp4-lab.sarc.samsung.com systemd[1]: Failed to set up mount unit: Invalid argument


Version-Release number of selected component (if applicable):

OCP 4.4
Logging Operator: 4.4.0-202008210157.p0 provided by Red Hat, Inc

How reproducible:

i was unable to reproduce the issue but the customer has been able to on 3 of his 4.4 clusters. 

Steps to Reproduce:
1. install logging operator
2. allow data to populate
3. check journal

Actual results:

journal is flooded with the above error message.

Comment 3 Jeff Cantrill 2020-10-07 13:43:46 UTC

Setting priority to low.  Investigation of the must gather shoes the logging system in a health state.

Comment 4 Jeff Cantrill 2020-10-07 14:08:01 UTC

Working with the storage team was pointed to:

https://access.redhat.com/solutions/5038151
https://bugzilla.redhat.com/show_bug.cgi?id=1779813

There is nothing that can be done from logging perspective to explicitly resolve the issue.

Comment 5 Jan Safranek 2020-10-07 16:14:15 UTC

We can do perhaps something on the storage side. I can see elasticsearch-cdm-7fc52t3q-2-5dd6cf7dbc-bfnvj.yaml pod running on node au1-ocpinf-d02.ocp4-lab.sarc.samsung.com. And it uses PVC elasticsearch-elasticsearch-cdm-7fc52t3q-2, which is mounted to the node as:

dev/sdb on /var/lib/kubelet/plugins/kubernetes.io/vsphere-volume/mounts/[NIM-ESX-VVOL-OCP-LAB] rfc4122.11bc26b0-694e-4917-9e80-f9919c8df059/ocp4-lab-t82zt-dynamic-pvc-0f13e3ad-97f8-41ab-9392-84562ef40d17.vmdk type ext4 (rw,relatime,seclabel)

$ systemd-escape /var/lib/kubelet/plugins/kubernetes.io/vsphere-volume/mounts/[NIM-ESX-VVOL-OCP-LAB] rfc4122.11bc26b0-694e-4917-9e80-f9919c8df059/ocp4-lab-t82zt-dynamic-pvc-0f13e3ad-97f8-41ab-9392-84562ef40d17.vmdk | wc -c
258

So it's over the systemd limit and systemd spams the log. The directory name must be shorter.

"ocp4-lab-t82zt" is cluster prefix, dunno if the customer can make it shorter.
"dynamic-pvc-0f13e3ad-97f8-41ab-9392-84562ef40d17.vmdk" is hardcoded in Kubernetes.
"11bc26b0-694e-4917-9e80-f9919c8df059" is UUID of the volume (or the datastore?) and is hardcoded in Kubernetes.
"[NIM-ESX-VVOL-OCP-LAB] rfc4122" comes from data store + folder name. Can the customer use one with shorter name / less dashes? Systemd escapes every "-" with 4 characters ("\x2d"). They need to save only few characters to get to the limit.

On the OCP / Kubernetes side, we will try to fix vSphere code not to depend on datastore name and always produce shorter directory names. This will take some time though.

Just to note: all pods are actually running, elastic should work. Just systemd spams the log in the background.

Comment 13 Hemant Kumar 2020-11-12 21:24:32 UTC

I have given up on trying to drop UUID of folder from volume path. That is too risky and can break all over the place. I am going for a simpler approach of reducing the prefix size - https://github.com/kubernetes/kubernetes/pull/96533

This should *somewhat* help with longer volume names which are on boundary of 255 chars (like the one reported in this bug). For other cases, we will have to document and suggest recommendations to the customer.

Comment 24 Jan Safranek 2021-03-19 14:35:34 UTC

*** Bug 1939416 has been marked as a duplicate of this bug. ***

Comment 26 Hemant Kumar 2021-03-19 16:57:32 UTC

I also filed a related systemd issue for this - https://bugzilla.redhat.com/show_bug.cgi?id=1940973

Comment 27 Hemant Kumar 2021-03-22 18:57:16 UTC

*** Bug 1940898 has been marked as a duplicate of this bug. ***

Comment 35 errata-xmlrpc 2021-07-27 22:33:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 40 Red Hat Bugzilla 2023-09-15 00:49:07 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.