Description of problem: If you create a LocalVolumeSet - for example named local-ext4, and it discovers one or more disks, it will successfully match the desired nodes and provision the PVs. However, if you delete the LocalVolumeSet, PersistentVolumes, and StorageClasses from your cluster, the symlinks created in /mnt/local-storage/local-ext4 don't get deleted. Ultimately, this means that if you were to create a new LocalVolumeSet, say local-blk, future nodes don't get provisioned. Version-Release number of selected component (if applicable): 4.6.0-202010061132.p0 Expected results: When a LocalVolumeSet is removed, the symlinks should be removed from /mnt/local-storage/<name>.
*** Bug 1887468 has been marked as a duplicate of this bug. ***
We're planning to solve it as part of https://issues.redhat.com/browse/STOR-390, however, the number of requested changes is quite big and we need to redesign how the whole LSO daemonset works.
Hi, I came across the document section on "Deleting the Local Storage Operator’s resources -> Removing a local volume "" in the 4.5 Openshift documents, where it is recommended to delete the symlinks manually by logging oneself to the node as a root user while deleting the Local volumes. Would that play a role here or do we expect any changes to these instructions as result of this bug. How would this look like in Openshift documents for 4.6. reference: https://docs.openshift.com/container-platform/4.5/storage/persistent_storage/persistent-storage-local.html#deleting-the-local-storage-operators-resources Document section is as below: -------------------------------- Edit the previously created LocalVolume to remove any unwanted disks. Edit the cluster resource: $ oc edit localvolume <name> -n local-storage Navigate to the lines under devicePaths, and delete any representing unwanted disks. Delete any PersistentVolumes created. $ oc delete pv <pv-name> Delete any symlinks on the node. Create a debug pod on the node: $ oc debug node/<node-name> Change your root directory to the host: $ chroot /host Navigate to the directory containing the local volume symlinks. $ cd /mnt/local-storage/<sc-name> The name of the StorageClass used to create the local volumes. Delete the symlink belonging to the removed device. $ rm <symlink> Thanks in advance !!
4.6 is the same as 4.5 in this regard, still manual deletion of symlinks.
Hi, in OCP 4.6, I tried by manually deleting the symlinks before re-creating the LocalVolume with another storageClassName. I could see the PV in the new node added to the LocalVolume definition is not being picked up by the cluster. It always displays the old list of PVs as earlier from the first LocalVolume - deployment. Neither, deleting the entire dir - /mnt/local-storage in the nodes helped. I followed the below steps, oc version Client Version: 4.6.0-0.nightly-s390x-2020-10-06-145952 Server Version: 4.6.0-0.nightly-s390x-2020-10-10-041058 Kubernetes Version: v1.19.0+d59ce34 Added secondary dasd disks to worker-001, worker-002, master-03 nodes. Installed the local storage operator using iib-pub-pending:v4.6 apiVersion: v1 kind: Namespace metadata: name: local-storage --- apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: name: local-operator-group namespace: local-storage spec: targetNamespaces: - local-storage --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: local-storage-operator namespace: local-storage spec: channel: "4.6" installPlanApproval: Automatic name: local-storage-operator source: redhat-operators-multi-arch-qe sourceNamespace: openshift-marketplace oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE local-storage local-storage-operator.4.6.0-202010061132.p0 Local Storage 4.6.0-202010061132.p0 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.16.1 Succeeded Create the LocalVolume apiVersion: "local.storage.openshift.io/v1" kind: "LocalVolume" metadata: name: "local-dasd-worker-master" namespace: "local-storage" spec: nodeSelector: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-001.ocp-t8359015.lnxne.boe - worker-002.ocp-t8359015.lnxne.boe storageClassDevices: - storageClassName: "local-sc-worker-master" volumeMode: Filesystem fsType: ext4 devicePaths: - /dev/dasdb1 # oc get sc -A NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE local-sc-worker-master kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 45s #oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-205a7997 20Gi RWO Delete Available local-sc-worker-master 42s local-pv-7d75daac 20Gi RWO Delete Available local-sc-worker-master 47s Now, delete the LocalVolume , pvs remove the symlinks by logging in as root user to the worker nodes create the local volume now to include the device path from master-03 also apiVersion: "local.storage.openshift.io/v1" kind: "LocalVolume" metadata: name: "local-dasd-worker-master" namespace: "local-storage" spec: nodeSelector: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-001.ocp-t8359015.lnxne.boe - worker-002.ocp-t8359015.lnxne.boe - master-03.ocp-t8359015.lnxne.boe storageClassDevices: - storageClassName: "local-sc-worker-master" volumeMode: Filesystem fsType: ext4 devicePaths: - /dev/dasdb1 create the LocalVolume and see the same list of PVs as before. The PV from master-03 is not discovered and included in the list which should be the ideal case. Neither, creating the LocalVolume with another storage class name nor deleting the LSO and redeploying the LocalVolume helped. I tried also installing LSO to openshift-local-storage namespace but it didn't help.
You likely have LSO not running on master node because it requires special toleration for that.
at this state, the node master-03 has only one taint as below, Taints: node-role.kubernetes.io/master:NoSchedule no other taints or tolerations set on the master-03 node. I am attaching the output of oc describe node master-03.ocp-t8359015.lnxne.boe for more clarity.
Created attachment 1721529 [details] oc describe node master-03.txt
Yes that is what I was talking about. You need to add toleration for that in LocalVolume object. Please see - https://docs.openshift.com/container-platform/4.5/storage/persistent_storage/persistent-storage-local.html#local-tolerations_persistent-storage-local
The section at https://docs.openshift.com/container-platform/4.5/storage/persistent_storage/persistent-storage-local.html#local-tolerations_persistent-storage-local talks about including tolerations in general to LocalVolume. -------------------------------------------------------------------------------------------------------------- I find the documents fails to provide clarity to have local PVs in master-nodes in two main aspects as below, -------------------------------------------------------------------------------------------------------------- Aspect-I: --------- Inorder to have local PV in master node, the LocalVolume definition should explicity include the below toleration - key: node-role.kubernetes.io/master operator: Exists If it is not included the diskmaker, provisioner pods are not scheduled on the particular master and eventually the underlying PV is not located. As a default OCP cluster's master nodes are mastersSchedulable=false, the user has to include this toleration inevitably. But, the documentation fails to include any note on the needed toleration to be set in the LocalVolume for having the local PVs on master nodes. -------------------------------------------------------------------------------------------------------------- Aspect-II: ---------- The section "2. Optional: Allow local storage creation on master and infrastructure nodes." https://docs.openshift.com/container-platform/4.3/storage/persistent_storage/persistent-storage-local.html#local-storage-install_persistent-storage-local The position of these instructions in the docs is mis-leading as it asks the user to apply the patch too early at a stage where the local storage operator is not even installed. "oc get ds" at this point returns zero result and one cannot apply these patches. And as I observed, oc ds resources are created only when LocalVolume is deployed and not when LSO operator is installed. In general, applying the patches at this point helps the diskmaker, provisioner pods to be scheduled on master nodes and local PV is identified. However, the patches donot have an effect when the LocalVolume already tolerates node-role.kubernetes.io/master - Exists. I could see the local PVs on master nodes being displayed even before the patch is applied. Expected Results: ----------------- 1. The documentation should include clearer notes on the tolerations to be set to have local PVs in master nodes. 2. The section "2. Optional: Allow local storage creation on master and infrastructure nodes." has to be removed from the existing place and included later in the section towards "Using tolerations with Local Storage Operator Pods". If needed, I shall create a separate bugzilla entry for the same. Thank you!!
Yes, I think documentation about taints for LocalVolume CRs can be clarified further. Please open a doc bug for that.
(In reply to Jan Safranek from comment #3) > We're planning to solve it as part of > https://issues.redhat.com/browse/STOR-390, however, the number of requested > changes is quite big and we need to redesign how the whole LSO daemonset > works. Now that STOR-390 is being worked on, can we consider this for 4.7?
https://issues.redhat.com/browse/STOR-390 is feature complete, should we assume this issue to be fixed?
Sorry for late reply. I think https://issues.redhat.com/browse/STOR-390 does not solve cleanup of the symlinks, that's https://issues.redhat.com/browse/STOR-455. And that one is not available in 4.7.