Bug 1884800 - Failed to set up mount unit: Invalid argument
Summary: Failed to set up mount unit: Invalid argument
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.8.0
Assignee: Hemant Kumar
QA Contact: Wei Duan
URL:
Whiteboard:
: 1939416 1940898 (view as bug list)
Depends On:
Blocks: 1915520
TreeView+ depends on / blocked
 
Reported: 2020-10-02 19:57 UTC by dtarabor
Modified: 2023-12-15 19:39 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:33:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 461 0 None closed Bug 1884800: Reduce volume name length for vsphere 2021-02-03 07:21:05 UTC
Github openshift kubernetes pull 701 0 None open Bug 1884800: Reduce names of vsphere volumes even further 2021-04-29 20:00:05 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:34:13 UTC

Description dtarabor 2020-10-02 19:57:04 UTC
Description of problem:

* fluentd pods are causing "Failed to set up mount unit: Invalid argument" errors multiple times per second.  

if the logging operator (fluentd) is removed, the problem resolves itself.

[root@au1-ocpinf-d01 ~]# journalctl --since "1 days ago" | grep "Invalid argument"
Sep 25 19:33:12 au1-ocpinf-d01.ocp4-lab.sarc.samsung.com systemd[1]: Failed to set up mount unit: Invalid argument
Sep 25 19:33:12 au1-ocpinf-d01.ocp4-lab.sarc.samsung.com systemd[1]: Failed to set up mount unit: Invalid argument
Sep 25 19:33:13 au1-ocpinf-d01.ocp4-lab.sarc.samsung.com systemd[1]: Failed to set up mount unit: Invalid argument
Sep 25 19:33:13 au1-ocpinf-d01.ocp4-lab.sarc.samsung.com systemd[1]: Failed to set up mount unit: Invalid argument


Version-Release number of selected component (if applicable):

OCP 4.4
Logging Operator: 4.4.0-202008210157.p0 provided by Red Hat, Inc

How reproducible:

i was unable to reproduce the issue but the customer has been able to on 3 of his 4.4 clusters. 

Steps to Reproduce:
1. install logging operator
2. allow data to populate
3. check journal

Actual results:

journal is flooded with the above error message.

Comment 3 Jeff Cantrill 2020-10-07 13:43:46 UTC
Setting priority to low.  Investigation of the must gather shoes the logging system in a health state.

Comment 4 Jeff Cantrill 2020-10-07 14:08:01 UTC
Working with the storage team was pointed to:

https://access.redhat.com/solutions/5038151
https://bugzilla.redhat.com/show_bug.cgi?id=1779813

There is nothing that can be done from logging perspective to explicitly resolve the issue.

Comment 5 Jan Safranek 2020-10-07 16:14:15 UTC
We can do perhaps something on the storage side. I can see elasticsearch-cdm-7fc52t3q-2-5dd6cf7dbc-bfnvj.yaml pod running on node au1-ocpinf-d02.ocp4-lab.sarc.samsung.com. And it uses PVC elasticsearch-elasticsearch-cdm-7fc52t3q-2, which is mounted to the node as:

dev/sdb on /var/lib/kubelet/plugins/kubernetes.io/vsphere-volume/mounts/[NIM-ESX-VVOL-OCP-LAB] rfc4122.11bc26b0-694e-4917-9e80-f9919c8df059/ocp4-lab-t82zt-dynamic-pvc-0f13e3ad-97f8-41ab-9392-84562ef40d17.vmdk type ext4 (rw,relatime,seclabel)

$ systemd-escape /var/lib/kubelet/plugins/kubernetes.io/vsphere-volume/mounts/[NIM-ESX-VVOL-OCP-LAB] rfc4122.11bc26b0-694e-4917-9e80-f9919c8df059/ocp4-lab-t82zt-dynamic-pvc-0f13e3ad-97f8-41ab-9392-84562ef40d17.vmdk | wc -c
258

So it's over the systemd limit and systemd spams the log. The directory name must be shorter.

"ocp4-lab-t82zt" is cluster prefix, dunno if the customer can make it shorter.
"dynamic-pvc-0f13e3ad-97f8-41ab-9392-84562ef40d17.vmdk" is hardcoded in Kubernetes.
"11bc26b0-694e-4917-9e80-f9919c8df059" is UUID of the volume (or the datastore?) and is hardcoded in Kubernetes.
"[NIM-ESX-VVOL-OCP-LAB] rfc4122" comes from data store + folder name. Can the customer use one with shorter name / less dashes? Systemd escapes every "-" with 4 characters ("\x2d"). They need to save only few characters to get to the limit.

On the OCP / Kubernetes side, we will try to fix vSphere code not to depend on datastore name and always produce shorter directory names. This will take some time though.


Just to note: all pods are actually running, elastic should work. Just systemd spams the log in the background.

Comment 13 Hemant Kumar 2020-11-12 21:24:32 UTC
I have given up on trying to drop UUID of folder from volume path. That is too risky and can break all over the place. I am going for a simpler approach of reducing the prefix size - https://github.com/kubernetes/kubernetes/pull/96533

This should *somewhat* help with longer volume names which are on boundary of 255 chars (like the one reported in this bug). For other cases, we will have to document and suggest recommendations to the customer.

Comment 24 Jan Safranek 2021-03-19 14:35:34 UTC
*** Bug 1939416 has been marked as a duplicate of this bug. ***

Comment 26 Hemant Kumar 2021-03-19 16:57:32 UTC
I also filed a related systemd issue for this - https://bugzilla.redhat.com/show_bug.cgi?id=1940973

Comment 27 Hemant Kumar 2021-03-22 18:57:16 UTC
*** Bug 1940898 has been marked as a duplicate of this bug. ***

Comment 35 errata-xmlrpc 2021-07-27 22:33:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 40 Red Hat Bugzilla 2023-09-15 00:49:07 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.