Description of problem: After creating a localvolume referencing devices at /dev/disk/by-path/ccw-0.0.0004 , rebooting nodes causes duplicate pvs to be created if device switched locations in kernel assigned device names (/dev/vd*). The generated pv paths reference these locations, not by-path. Version-Release number of selected component (if applicable): Multiple versions, the latest being "ocp 4.10.0-rc.1" and lso version "4.10.0-202202071841 How reproducible: Anytime the real device changes path. Steps to Reproduce: 1. Create local volume referencing disks by /dev/disk/by-path 2. Reboot nodes, perhaps multiple times to force /dev/vd* path change Actual results: `oc get pv` shows more pvs than actual devices present Expected results: When referenced disk path, changing /dev/vd* locations should not affect the persistent volumes. localvolume: apiVersion: "local.storage.openshift.io/v1" kind: "LocalVolume" metadata: name: "local-disks" namespace: "openshift-local-storage" spec: nodeSelector: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-0.pok-93.ocptest.pok.stglabs.ibm.com - worker-1.pok-93.ocptest.pok.stglabs.ibm.com storageClassDevices: - storageClassName: "lso-fs" volumeMode: Filesystem fsType: ext4 devicePaths: - /dev/disk/by-path/ccw-0.0.0004 PV Dump: ``` apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker-0.pok-93.ocptest.pok.stglabs.ibm.com-e4947a81-cf42-48d7-9819-ed77c5759955 storage.openshift.com/device-name: vdc creationTimestamp: "2022-02-09T16:15:10Z" finalizers: - kubernetes.io/pv-protection labels: kubernetes.io/hostname: worker-0.pok-93.ocptest.pok.stglabs.ibm.com storage.openshift.com/local-volume-owner-name: local-disks storage.openshift.com/local-volume-owner-namespace: openshift-local-storage storage.openshift.com/owner-kind: LocalVolume storage.openshift.com/owner-name: local-disks storage.openshift.com/owner-namespace: openshift-local-storage name: local-pv-7694dd7d ownerReferences: - apiVersion: v1 kind: Node name: worker-0.pok-93.ocptest.pok.stglabs.ibm.com uid: e4947a81-cf42-48d7-9819-ed77c5759955 resourceVersion: "978317" uid: 274b1b63-3040-4653-97e2-bb29d9dacac2 spec: accessModes: - ReadWriteOnce capacity: storage: 20Gi local: fsType: ext4 path: /mnt/local-storage/lso-fs/vdc nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-0.pok-93.ocptest.pok.stglabs.ibm.com persistentVolumeReclaimPolicy: Delete storageClassName: lso-fs volumeMode: Filesystem status: phase: Available - apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker-0.pok-93.ocptest.pok.stglabs.ibm.com-e4947a81-cf42-48d7-9819-ed77c5759955 storage.openshift.com/device-name: vdb creationTimestamp: "2022-02-09T16:08:51Z" finalizers: - kubernetes.io/pv-protection labels: kubernetes.io/hostname: worker-0.pok-93.ocptest.pok.stglabs.ibm.com storage.openshift.com/local-volume-owner-name: local-disks storage.openshift.com/local-volume-owner-namespace: openshift-local-storage storage.openshift.com/owner-kind: LocalVolume storage.openshift.com/owner-name: local-disks storage.openshift.com/owner-namespace: openshift-local-storage name: local-pv-cfd12a48 ownerReferences: - apiVersion: v1 kind: Node name: worker-0.pok-93.ocptest.pok.stglabs.ibm.com uid: e4947a81-cf42-48d7-9819-ed77c5759955 resourceVersion: "975044" uid: 5139320d-81de-4e5f-8e64-a48f80e0be98 spec: accessModes: - ReadWriteOnce capacity: storage: 20Gi local: fsType: ext4 path: /mnt/local-storage/lso-fs/vdb nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-0.pok-93.ocptest.pok.stglabs.ibm.com persistentVolumeReclaimPolicy: Delete storageClassName: lso-fs volumeMode: Filesystem status: phase: Available - apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker-1.pok-93.ocptest.pok.stglabs.ibm.com-a29a3b5b-cb66-4439-b107-ce1eb637eb17 storage.openshift.com/device-name: vdb creationTimestamp: "2022-02-09T16:08:45Z" finalizers: - kubernetes.io/pv-protection labels: kubernetes.io/hostname: worker-1.pok-93.ocptest.pok.stglabs.ibm.com storage.openshift.com/local-volume-owner-name: local-disks storage.openshift.com/local-volume-owner-namespace: openshift-local-storage storage.openshift.com/owner-kind: LocalVolume storage.openshift.com/owner-name: local-disks storage.openshift.com/owner-namespace: openshift-local-storage name: local-pv-f8753489 ownerReferences: - apiVersion: v1 kind: Node name: worker-1.pok-93.ocptest.pok.stglabs.ibm.com uid: a29a3b5b-cb66-4439-b107-ce1eb637eb17 resourceVersion: "975004" uid: 6f8e3d97-17ba-4f68-bdfc-c76f63a2dd99 spec: accessModes: - ReadWriteOnce capacity: storage: 20Gi local: fsType: ext4 path: /mnt/local-storage/lso-fs/vdb nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-1.pok-93.ocptest.pok.stglabs.ibm.com persistentVolumeReclaimPolicy: Delete storageClassName: lso-fs volumeMode: Filesystem status: phase: Available kind: List metadata: resourceVersion: "" selfLink: "" ``` Additional info: persistent volume paths show /vd* while localvolume path shows /dev/disk/by-path/ccw-0.0.0004 ``` oc get pv -o yaml | grep path path: /mnt/local-storage/lso-fs/vdc path: /mnt/local-storage/lso-fs/vdb path: /mnt/local-storage/lso-fs/vdb ``` oc get localvolume -o yaml | grep path {"apiVersion":"local.storage.openshift.io/v1","kind":"LocalVolume","metadata":{"annotations":{},"name":"local-disks","namespace":"openshift-local-storage"},"spec":{"nodeSelector":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"kubernetes.io/hostname","operator":"In","values":["worker-0.pok-93.ocptest.pok.stglabs.ibm.com","worker-1.pok-93.ocptest.pok.stglabs.ibm.com"]}]}]},"storageClassDevices":[{"devicePaths":["/dev/disk/by-path/ccw-0.0.0004"],"fsType":"ext4","storageClassName":"lso-fs","volumeMode":"Filesystem"}]}} - /dev/disk/by-path/ccw-0.0.0004
LSO prefers paths in `/dev/disk/by-id` rather than `/dev/disk/by-path`. There are historical reasons for that and although in general I agree that - it might be suitable to fall back to `/dev/disk/by-path` if `/dev/disk/by-id` is not suitable, but we are not currently not doing that. So, can you please try to reproduce this issue by using `/dev/disk/by-id` ? We can still fix this issue but if it is possible to use `by-id` path, it will at least unblock you.
In the environment I've been using (zKVM with qcow2 disk) my disk is only mapped to /dev/disk/by-path, not by-id. I'll continue to look for other solutions. ``` udevadm info /dev/vdb P: /devices/css0/0.0.0002/0.0.0004/virtio2/block/vdb N: vdb S: disk/by-path/ccw-0.0.0004 E: DEVLINKS=/dev/disk/by-path/ccw-0.0.0004 E: DEVNAME=/dev/vdb E: DEVPATH=/devices/css0/0.0.0002/0.0.0004/virtio2/block/vdb E: DEVTYPE=disk E: ID_PATH=ccw-0.0.0004 E: ID_PATH_TAG=ccw-0_0_0004 E: MAJOR=252 E: MINOR=16 E: SUBSYSTEM=block ```
I tested on zVM clusters that do have device links in /dev/disk/by-path and these do work as expected with lso. Whereas on zKVM using qcow or virtual device passthrough, both are not giving by-id links. I've also tried using /dev/disk/by-partuuid for localVolume devicePath, but the same problem as when using by-path occurs. While we wait for a fix could we add documentation that only "by-id" is supported? Currently in https://docs.openshift.com/container-platform/4.9/storage/persistent_storage/persistent-storage-local.html#local-volume-cr_persistent-storage-local the docs say "local disks filepath to the LocalVolume resource, such as /dev/disk/by-id/wwn" . Can we add a note that only "by-id" works? Should I create a separate bugzilla against the documentation?
LSO should use device name from /dev/disk/by-id, if it's available. If not, then the device name from LocalVolume CR should be used and not /dev/sdX.
*** Bug 2059760 has been marked as a duplicate of this bug. ***
I tested this by creating LVM volumes and having udev rule that creates disk-ids for LVM volumes disabled. I think rule in question is - /lib/udev/rules.d/13-dm-disk.rules , you can copy it to `/etc/udev/rules.d` folder and modify it. Tom Dale - Can you verify https://github.com/openshift/local-storage-operator/pull/328 fix in your environment btw? You should be able to build an image https://github.com/openshift/local-storage-operator/blob/master/hack/sync_bundle using it.
1.attach volume nvme1n1 2.check /dev/disk/by-id lrwxrwxrwx. 1 root root 13 Mar 9 03:24 nvme-nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme1n1 lrwxrwxrwx. 1 root root 13 Mar 9 03:24 nvme-Amazon_Elastic_Block_Store_vol089d83b3dc8eae915 -> ../../nvme1n1 3.exec `udevadm control -s` 4.delete symlink in /dev/disk/by-id for volume /dev/nvme1n1 5. ls -lrt /dev/disk/by-path | grep nvme1n1 lrwxrwxrwx. 1 root root 13 Mar 9 03:24 pci-0000:00:1f.0-nvme-1 -> ../../nvme1n1 6.Create localvolume with device path oc get localvolume example -o json | jq .spec { "logLevel": "Normal", "managementState": "Managed", "storageClassDevices": [ { "devicePaths": [ "/dev/disk/by-path/pci-0000:00:1f.0-nvme-1" ], "fsType": "ext4", "storageClassName": "foobar", "volumeMode": "Filesystem" } ] } 7.check pv path oc get pv -o yaml | grep path path: /mnt/local-storage/foobar/pci-0000:00:1f.0-nvme-1 8. ls -lrt /mnt/local-storage/foobar/ total 0 lrwxrwxrwx. 1 root root 41 Mar 10 01:58 pci-0000:00:1f.0-nvme-1 -> /dev/disk/by-path/pci-0000:00:1f.0-nvme-1 udevadm info /dev/nvme1n1 P: /devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1 N: nvme1n1 S: disk/by-id/nvme-Amazon_Elastic_Block_Store_vol089d83b3dc8eae915 S: disk/by-id/nvme-nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 S: disk/by-path/pci-0000:00:1f.0-nvme-1 E: DEVLINKS=/dev/disk/by-path/pci-0000:00:1f.0-nvme-1 /dev/disk/by-id/nvme-nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol089d83b3dc8eae915 E: DEVNAME=/dev/nvme1n1 E: DEVPATH=/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1 E: DEVTYPE=disk E: ID_MODEL=Amazon Elastic Block Store E: ID_PATH=pci-0000:00:1f.0-nvme-1 E: ID_PATH_TAG=pci-0000_00_1f_0-nvme-1 E: ID_SERIAL=Amazon Elastic Block Store_vol089d83b3dc8eae915 E: ID_SERIAL_SHORT=vol089d83b3dc8eae915 E: ID_WWN=nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 E: ID_WWN_WITH_EXTENSION=nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 E: MAJOR=259 E: MINOR=5 E: SUBSYSTEM=block E: TAGS=:systemd: E: USEC_INITIALIZED=2743697481 Tested with local-storage-operator.4.11.0-202203071904 @hekumar can you help to double confirm this way is correct or not?
Thanks for testing that. This seems correct.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069