Bug 1884800
Summary: | Failed to set up mount unit: Invalid argument | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | dtarabor |
Component: | Storage | Assignee: | Hemant Kumar <hekumar> |
Storage sub component: | Kubernetes | QA Contact: | Wei Duan <wduan> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | medium | CC: | aos-bugs, darya, gvillani, hekumar, jcrumple, jsafrane, mgugino, ngirard, openshift-bugs-escalate, smulje, spasquie, sreber, ssonigra |
Version: | 4.4 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:33:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1915520 |
Description
dtarabor
2020-10-02 19:57:04 UTC
Setting priority to low. Investigation of the must gather shoes the logging system in a health state. Working with the storage team was pointed to: https://access.redhat.com/solutions/5038151 https://bugzilla.redhat.com/show_bug.cgi?id=1779813 There is nothing that can be done from logging perspective to explicitly resolve the issue. We can do perhaps something on the storage side. I can see elasticsearch-cdm-7fc52t3q-2-5dd6cf7dbc-bfnvj.yaml pod running on node au1-ocpinf-d02.ocp4-lab.sarc.samsung.com. And it uses PVC elasticsearch-elasticsearch-cdm-7fc52t3q-2, which is mounted to the node as: dev/sdb on /var/lib/kubelet/plugins/kubernetes.io/vsphere-volume/mounts/[NIM-ESX-VVOL-OCP-LAB] rfc4122.11bc26b0-694e-4917-9e80-f9919c8df059/ocp4-lab-t82zt-dynamic-pvc-0f13e3ad-97f8-41ab-9392-84562ef40d17.vmdk type ext4 (rw,relatime,seclabel) $ systemd-escape /var/lib/kubelet/plugins/kubernetes.io/vsphere-volume/mounts/[NIM-ESX-VVOL-OCP-LAB] rfc4122.11bc26b0-694e-4917-9e80-f9919c8df059/ocp4-lab-t82zt-dynamic-pvc-0f13e3ad-97f8-41ab-9392-84562ef40d17.vmdk | wc -c 258 So it's over the systemd limit and systemd spams the log. The directory name must be shorter. "ocp4-lab-t82zt" is cluster prefix, dunno if the customer can make it shorter. "dynamic-pvc-0f13e3ad-97f8-41ab-9392-84562ef40d17.vmdk" is hardcoded in Kubernetes. "11bc26b0-694e-4917-9e80-f9919c8df059" is UUID of the volume (or the datastore?) and is hardcoded in Kubernetes. "[NIM-ESX-VVOL-OCP-LAB] rfc4122" comes from data store + folder name. Can the customer use one with shorter name / less dashes? Systemd escapes every "-" with 4 characters ("\x2d"). They need to save only few characters to get to the limit. On the OCP / Kubernetes side, we will try to fix vSphere code not to depend on datastore name and always produce shorter directory names. This will take some time though. Just to note: all pods are actually running, elastic should work. Just systemd spams the log in the background. I have given up on trying to drop UUID of folder from volume path. That is too risky and can break all over the place. I am going for a simpler approach of reducing the prefix size - https://github.com/kubernetes/kubernetes/pull/96533 This should *somewhat* help with longer volume names which are on boundary of 255 chars (like the one reported in this bug). For other cases, we will have to document and suggest recommendations to the customer. *** Bug 1939416 has been marked as a duplicate of this bug. *** I also filed a related systemd issue for this - https://bugzilla.redhat.com/show_bug.cgi?id=1940973 *** Bug 1940898 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |