2067190 – [Doc] [4.10.0] HPP-CSI-PVC fails to bind PVC when node fqdn is long

Bug 2067190 - [Doc] [4.10.0] HPP-CSI-PVC fails to bind PVC when node fqdn is long

Summary: [Doc] [4.10.0] HPP-CSI-PVC fails to bind PVC when node fqdn is long

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Documentation
Sub Component:
Version:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Pan Ousley
QA Contact:	Oded Ramraz
Docs Contact:
URL:
Whiteboard:
Depends On:	2057157
Blocks:
TreeView+	depends on / blocked

Reported:	2022-03-23 13:43 UTC by Yan Du
Modified:	2022-04-18 11:13 UTC (History)
CC List:	9 users (show)
Fixed In Version:	4.10
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2057157
Environment:
Last Closed:	2022-04-01 20:16:59 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubernetes-csi external-provisioner pull 717	0	None	Merged	fix managed-by label being too long when the node name is long.	2022-03-31 16:49:34 UTC

Description Yan Du 2022-03-23 13:43:39 UTC

+++ This bug was initially created as a clone of Bug #2057157 +++

Description of problem:

logs from csi-provisioner container in the hostpath-provisioner-csi pod

E0222 17:52:54.088950       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: unable to parse requirement: values[0][csi.storage.k8s.io/managed-by]: Invalid value: "external-provisioner-TRIMMED": must be no more than 63 characters

Version-Release number of selected component (if applicable):
OCP-4.10.0
CNV-4.10.0-686

How reproducible:

The issue reproduce with both hpp-csi-basic and also with hpp-csi-pvc-block storage class. Using WFFC bindingMode on the storage class. Note that when I change it to Immediate it works.


Steps to Reproduce:
1. Deploy HPP-CSI on cluster which has long fqdn names.
2. Try to bind a PVC
3.

Actual results: Stuck in pending state


Expected results: PVC should bind with PV


Additional info:

apiVersion: v1
items:
- apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
  kind: HostPathProvisioner
  metadata:
    creationTimestamp: "2022-02-22T17:52:08Z"
    finalizers:
    - finalizer.delete.hostpath-provisioner
    generation: 12
    name: hostpath-provisioner
    resourceVersion: "28215"
    uid: 70dcf437-0b0b-4626-b6ae-fd405d397865
  spec:
    imagePullPolicy: IfNotPresent
    storagePools:
    - name: hpp-csi-local-basic
      path: /var/hpp-csi-local-basic
    - name: hpp-csi-pvc-block
      path: /var/hpp-csi-pvc-block
      pvcTemplate:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: local-block-hpp
        volumeMode: Block
    workload:
      nodeSelector:
        kubernetes.io/os: linux
  status:
    conditions:
    - lastHeartbeatTime: "2022-02-22T17:52:31Z"
      lastTransitionTime: "2022-02-22T17:52:31Z"
      message: Application Available
      reason: Complete
      status: "True"
      type: Available
    - lastHeartbeatTime: "2022-02-22T17:52:31Z"
      lastTransitionTime: "2022-02-22T17:52:31Z"
      status: "False"
      type: Progressing
    - lastHeartbeatTime: "2022-02-22T17:52:31Z"
      lastTransitionTime: "2022-02-22T17:52:09Z"
      status: "False"
      type: Degraded
    observedVersion: v4.10.0
    operatorVersion: v4.10.0
    storagePoolStatuses:
    - name: hpp-csi-local-basic
      phase: Ready
    - claimStatuses:
      - name: hpp-pool-5e8d1dd5
        status:
          accessModes:
          - ReadWriteOnce
          capacity:
            storage: 446Gi
          phase: Bound
      currentReady: 1
      desiredReady: 1
      name: hpp-csi-pvc-block
      phase: Ready
    targetVersion: v4.10.0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2022-02-22T17:52:08Z"
  name: hostpath-csi-basic
  resourceVersion: "27230"
  uid: 54847180-2570-4a32-9a95-4d4d208c5d69
parameters:
  storagePool: hpp-csi-local-basic
provisioner: kubevirt.io.hostpath-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer


---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2022-02-22T19:02:09Z"
  name: hostpath-csi-pvc-block
  resourceVersion: "178561"
  uid: e3dead94-111d-469a-a8af-c8796eaf3d54
parameters:
  storagePool: hpp-csi-pvc-block
provisioner: kubevirt.io.hostpath-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

--- Additional comment from Lukas Bednar on 2022-02-22 19:50:46 UTC ---

The name of the node is: cnv-qe-infra-23.cnvqe2.lab.eng.rdu2.redhat.com

This is the full line of the error:

E0222 19:18:16.371730       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: unable to parse requirement: values[0][csi.storage.k8s.io/managed-by]: Invalid value: "external-provisioner-cnv-qe-infra-23.cnvqe2.lab.eng.rdu2.redhat.com": must be no more than 63 characters

--- Additional comment from Denis Ollier on 2022-02-22 20:07:54 UTC ---

Note that another error but with the same cause has already been spotted with the move to HPP-CSI backend.

It is probably fixed already but adding it here for reference:

> Alexander Wels, Dec 10, 7:25 PM
> Hah, just found a bug, because the node name is pretty long, and I use it in a label somewhere, the label length got exceeded, and a job was unable to be created.
> ```{"level":"error","ts":1639159272.0754337,"logger":"controller_hostpathprovisioner","msg":"Unable to create cleanup job","Request.Namespace":"","Request.Name":"hostpath-provisioner","name":"cleanup-pool-local-cnv-qe-infra-23.cnvqe2.lab.eng.rdu2.redhat.com","error":"Job.batch \"cleanup-pool-local-cnv-qe-infra-23.cnvqe2.lab.eng.rdu2.redhat.com\" is invalid: spec.template.labels: Invalid value: \"cleanup-pool-local-cnv-qe-infra-23.cnvqe2.lab.eng.rdu2.redhat.com\": must be no more than 63 characters","stacktrace":"kubevirt.io/hostpath-provisioner-operator/pkg/controller/hostpathprovisioner.(*ReconcileHostPathProvisioner).Reconcile\n\t/remote-source/app/pkg/controller/hostpathprovisioner/controller.go:307\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/co...```

--- Additional comment from Alexander Wels on 2022-02-23 13:59:36 UTC ---

The pertinent message is definitely this:

E0222 17:52:54.088950       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: unable to parse requirement: values[0][csi.storage.k8s.io/managed-by]: Invalid value: "external-provisioner-TRIMMED": must be no more than 63 characters

Basically it is saying it is unable to watch and manage the CSIStorageCapacity object because the csi.storage.k8s.io/managed-by label is too long. This object is created by the csi external provisioner and that is one of the side cars in use by the hpp CSI driver. I have opened an issue on the external provisioner [0] 

[0] https://github.com/kubernetes-csi/external-provisioner/issues/707

--- Additional comment from Lukas Bednar on 2022-03-16 10:45:04 UTC ---

In order to work around this issue, you need to disable storageCapacity option in the HPP CSI Driver.

$ oc patch csidriver kubevirt.io.hostpath-provisioner --type merge --patch '{"spec": {"storageCapacity": false}}'

--- Additional comment from Yan Du on 2022-03-16 13:26:27 UTC ---

Is it too late to have release note for this?

Comment 2 Pan Ousley 2022-03-31 21:52:51 UTC

Hi Yan, can you please review the following PR?

https://github.com/openshift/openshift-docs/pull/44069

Preview build: https://deploy-preview-44069--osdocs.netlify.app/openshift-enterprise/latest/virt/virt-4-10-release-notes#virt-4-10-known-issues

I am skipping to ON_QA due to urgency, but I also welcome a review from @awels :)

Thank you!

Comment 3 Yan Du 2022-04-01 06:47:02 UTC

@Catherine, Thanks for your detail explanation about doc team process :)

@Pan, the doc lgtm. Thanks :)

Comment 4 Pan Ousley 2022-04-01 20:16:59 UTC

Thanks, Yan (and awels)! The known issue is live in the 4.10 release notes: https://docs.openshift.com/container-platform/4.10/virt/virt-4-10-release-notes.html#virt-4-10-known-issues

Closing this bug.

Note You need to log in before you can comment on or make changes to this bug.