+++ This bug was initially created as a clone of Bug #2052618 +++
Description of problem:
After creating a localvolume referencing devices at /dev/disk/by-path/ccw-0.0.0004 , rebooting nodes causes duplicate pvs to be created if device switched locations in kernel assigned device names (/dev/vd*). The generated pv paths reference these locations, not by-path.
Version-Release number of selected component (if applicable):
Multiple versions, the latest being
"ocp 4.10.0-rc.1" and lso version "4.10.0-202202071841
How reproducible:
Anytime the real device changes path.
Steps to Reproduce:
1. Create local volume referencing disks by /dev/disk/by-path
2. Reboot nodes, perhaps multiple times to force /dev/vd* path change
Actual results:
`oc get pv` shows more pvs than actual devices present
Expected results:
When referenced disk path, changing /dev/vd* locations should not affect the persistent volumes.
localvolume:
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "local-disks"
namespace: "openshift-local-storage"
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-0.pok-93.ocptest.pok.stglabs.ibm.com
- worker-1.pok-93.ocptest.pok.stglabs.ibm.com
storageClassDevices:
- storageClassName: "lso-fs"
volumeMode: Filesystem
fsType: ext4
devicePaths:
- /dev/disk/by-path/ccw-0.0.0004
PV Dump:
```
apiVersion: v1
items:
- apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker-0.pok-93.ocptest.pok.stglabs.ibm.com-e4947a81-cf42-48d7-9819-ed77c5759955
storage.openshift.com/device-name: vdc
creationTimestamp: "2022-02-09T16:15:10Z"
finalizers:
- kubernetes.io/pv-protection
labels:
kubernetes.io/hostname: worker-0.pok-93.ocptest.pok.stglabs.ibm.com
storage.openshift.com/local-volume-owner-name: local-disks
storage.openshift.com/local-volume-owner-namespace: openshift-local-storage
storage.openshift.com/owner-kind: LocalVolume
storage.openshift.com/owner-name: local-disks
storage.openshift.com/owner-namespace: openshift-local-storage
name: local-pv-7694dd7d
ownerReferences:
- apiVersion: v1
kind: Node
name: worker-0.pok-93.ocptest.pok.stglabs.ibm.com
uid: e4947a81-cf42-48d7-9819-ed77c5759955
resourceVersion: "978317"
uid: 274b1b63-3040-4653-97e2-bb29d9dacac2
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
local:
fsType: ext4
path: /mnt/local-storage/lso-fs/vdc
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-0.pok-93.ocptest.pok.stglabs.ibm.com
persistentVolumeReclaimPolicy: Delete
storageClassName: lso-fs
volumeMode: Filesystem
status:
phase: Available
- apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker-0.pok-93.ocptest.pok.stglabs.ibm.com-e4947a81-cf42-48d7-9819-ed77c5759955
storage.openshift.com/device-name: vdb
creationTimestamp: "2022-02-09T16:08:51Z"
finalizers:
- kubernetes.io/pv-protection
labels:
kubernetes.io/hostname: worker-0.pok-93.ocptest.pok.stglabs.ibm.com
storage.openshift.com/local-volume-owner-name: local-disks
storage.openshift.com/local-volume-owner-namespace: openshift-local-storage
storage.openshift.com/owner-kind: LocalVolume
storage.openshift.com/owner-name: local-disks
storage.openshift.com/owner-namespace: openshift-local-storage
name: local-pv-cfd12a48
ownerReferences:
- apiVersion: v1
kind: Node
name: worker-0.pok-93.ocptest.pok.stglabs.ibm.com
uid: e4947a81-cf42-48d7-9819-ed77c5759955
resourceVersion: "975044"
uid: 5139320d-81de-4e5f-8e64-a48f80e0be98
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
local:
fsType: ext4
path: /mnt/local-storage/lso-fs/vdb
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-0.pok-93.ocptest.pok.stglabs.ibm.com
persistentVolumeReclaimPolicy: Delete
storageClassName: lso-fs
volumeMode: Filesystem
status:
phase: Available
- apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker-1.pok-93.ocptest.pok.stglabs.ibm.com-a29a3b5b-cb66-4439-b107-ce1eb637eb17
storage.openshift.com/device-name: vdb
creationTimestamp: "2022-02-09T16:08:45Z"
finalizers:
- kubernetes.io/pv-protection
labels:
kubernetes.io/hostname: worker-1.pok-93.ocptest.pok.stglabs.ibm.com
storage.openshift.com/local-volume-owner-name: local-disks
storage.openshift.com/local-volume-owner-namespace: openshift-local-storage
storage.openshift.com/owner-kind: LocalVolume
storage.openshift.com/owner-name: local-disks
storage.openshift.com/owner-namespace: openshift-local-storage
name: local-pv-f8753489
ownerReferences:
- apiVersion: v1
kind: Node
name: worker-1.pok-93.ocptest.pok.stglabs.ibm.com
uid: a29a3b5b-cb66-4439-b107-ce1eb637eb17
resourceVersion: "975004"
uid: 6f8e3d97-17ba-4f68-bdfc-c76f63a2dd99
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
local:
fsType: ext4
path: /mnt/local-storage/lso-fs/vdb
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-1.pok-93.ocptest.pok.stglabs.ibm.com
persistentVolumeReclaimPolicy: Delete
storageClassName: lso-fs
volumeMode: Filesystem
status:
phase: Available
kind: List
metadata:
resourceVersion: ""
selfLink: ""
```
Additional info:
persistent volume paths show /vd* while localvolume path shows /dev/disk/by-path/ccw-0.0.0004
```
oc get pv -o yaml | grep path
path: /mnt/local-storage/lso-fs/vdc
path: /mnt/local-storage/lso-fs/vdb
path: /mnt/local-storage/lso-fs/vdb
```
oc get localvolume -o yaml | grep path
{"apiVersion":"local.storage.openshift.io/v1","kind":"LocalVolume","metadata":{"annotations":{},"name":"local-disks","namespace":"openshift-local-storage"},"spec":{"nodeSelector":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"kubernetes.io/hostname","operator":"In","values":["worker-0.pok-93.ocptest.pok.stglabs.ibm.com","worker-1.pok-93.ocptest.pok.stglabs.ibm.com"]}]}]},"storageClassDevices":[{"devicePaths":["/dev/disk/by-path/ccw-0.0.0004"],"fsType":"ext4","storageClassName":"lso-fs","volumeMode":"Filesystem"}]}}
- /dev/disk/by-path/ccw-0.0.0004
--- Additional comment from Hemant Kumar on 2022-02-09 17:17:12 UTC ---
LSO prefers paths in `/dev/disk/by-id` rather than `/dev/disk/by-path`. There are historical reasons for that and although in general I agree that - it might be suitable to fall back to `/dev/disk/by-path` if `/dev/disk/by-id` is not suitable, but we are not currently not doing that.
So, can you please try to reproduce this issue by using `/dev/disk/by-id` ? We can still fix this issue but if it is possible to use `by-id` path, it will at least unblock you.
--- Additional comment from Tom Dale on 2022-02-09 17:55:59 UTC ---
In the environment I've been using (zKVM with qcow2 disk) my disk is only mapped to /dev/disk/by-path, not by-id. I'll continue to look for other solutions.
```
udevadm info /dev/vdb
P: /devices/css0/0.0.0002/0.0.0004/virtio2/block/vdb
N: vdb
S: disk/by-path/ccw-0.0.0004
E: DEVLINKS=/dev/disk/by-path/ccw-0.0.0004
E: DEVNAME=/dev/vdb
E: DEVPATH=/devices/css0/0.0.0002/0.0.0004/virtio2/block/vdb
E: DEVTYPE=disk
E: ID_PATH=ccw-0.0.0004
E: ID_PATH_TAG=ccw-0_0_0004
E: MAJOR=252
E: MINOR=16
E: SUBSYSTEM=block
```
--- Additional comment from Tom Dale on 2022-02-11 16:17:32 UTC ---
I tested on zVM clusters that do have device links in /dev/disk/by-path and these do work as expected with lso.
Whereas on zKVM using qcow or virtual device passthrough, both are not giving by-id links. I've also tried using /dev/disk/by-partuuid for localVolume devicePath, but the same problem as when using by-path occurs.
While we wait for a fix could we add documentation that only "by-id" is supported? Currently in https://docs.openshift.com/container-platform/4.9/storage/persistent_storage/persistent-storage-local.html#local-volume-cr_persistent-storage-local the docs say "local disks filepath to the LocalVolume resource, such as /dev/disk/by-id/wwn" . Can we add a note that only "by-id" works? Should I create a separate bugzilla against the documentation?
--- Additional comment from Jan Safranek on 2022-03-01 15:22:29 UTC ---
LSO should use device name from /dev/disk/by-id, if it's available. If not, then the device name from LocalVolume CR should be used and not /dev/sdX.
--- Additional comment from Jan Safranek on 2022-03-04 10:19:51 UTC ---
--- Additional comment from OpenShift Automated Release Tooling on 2022-03-07 19:50:25 UTC ---
Elliott changed bug status from MODIFIED to ON_QA.
This bug is expected to ship in the next 4.11 release created.
--- Additional comment from Chao Yang on 2022-03-09 14:16:42 UTC ---
Hi @hekumar, I am trying to verify this bug with below steps. I could not get device path links here, but using device name may not correct. Could you give some suggestion?
1. exec `udevadm control -s` to stop execute events
2. attach volume
udevadm info /dev/nvme2n1
P: /devices/pci0000:00/0000:00:1e.0/nvme/nvme2/nvme2n1
N: nvme2n1
E: DEVNAME=/dev/nvme2n1
E: DEVPATH=/devices/pci0000:00/0000:00:1e.0/nvme/nvme2/nvme2n1
E: DEVTYPE=disk
E: MAJOR=259
E: MINOR=6
E: SUBSYSTEM=block
3. Create localvolume with device name
oc get localvolume example -o json | jq .spec
{
"logLevel": "Normal",
"managementState": "Managed",
"storageClassDevices": [
{
"devicePaths": [
"/dev/nvme1n1"
],
"fsType": "ext4",
"storageClassName": "test1",
"volumeMode": "Filesystem"
},
{
"devicePaths": [
"/dev/nvme2n1"
],
"fsType": "ext4",
"storageClassName": "test2",
"volumeMode": "Filesystem"
}
]
}
4.oc get pv/local-pv-f420c11b -o yaml | grep path
path: /mnt/local-storage/test2/nvme2n1
--- Additional comment from Hemant Kumar on 2022-03-09 15:07:40 UTC ---
I tested this by creating LVM volumes and having udev rule that creates disk-ids for LVM volumes disabled. I think rule in question is - /lib/udev/rules.d/13-dm-disk.rules , you can copy it to `/etc/udev/rules.d` folder and modify it.
Tom Dale - Can you verify https://github.com/openshift/local-storage-operator/pull/328 fix in your environment btw? You should be able to build an image https://github.com/openshift/local-storage-operator/blob/master/hack/sync_bundle using it.
--- Additional comment from Chao Yang on 2022-03-10 02:24:27 UTC ---
1.attach volume nvme1n1
2.check /dev/disk/by-id
lrwxrwxrwx. 1 root root 13 Mar 9 03:24 nvme-nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme1n1
lrwxrwxrwx. 1 root root 13 Mar 9 03:24 nvme-Amazon_Elastic_Block_Store_vol089d83b3dc8eae915 -> ../../nvme1n1
3.exec `udevadm control -s`
4.delete symlink in /dev/disk/by-id for volume /dev/nvme1n1
5.
ls -lrt /dev/disk/by-path | grep nvme1n1
lrwxrwxrwx. 1 root root 13 Mar 9 03:24 pci-0000:00:1f.0-nvme-1 -> ../../nvme1n1
6.Create localvolume with device path
oc get localvolume example -o json | jq .spec
{
"logLevel": "Normal",
"managementState": "Managed",
"storageClassDevices": [
{
"devicePaths": [
"/dev/disk/by-path/pci-0000:00:1f.0-nvme-1"
],
"fsType": "ext4",
"storageClassName": "foobar",
"volumeMode": "Filesystem"
}
]
}
7.check pv path
oc get pv -o yaml | grep path
path: /mnt/local-storage/foobar/pci-0000:00:1f.0-nvme-1
8.
ls -lrt /mnt/local-storage/foobar/
total 0
lrwxrwxrwx. 1 root root 41 Mar 10 01:58 pci-0000:00:1f.0-nvme-1 -> /dev/disk/by-path/pci-0000:00:1f.0-nvme-1
udevadm info /dev/nvme1n1
P: /devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
N: nvme1n1
S: disk/by-id/nvme-Amazon_Elastic_Block_Store_vol089d83b3dc8eae915
S: disk/by-id/nvme-nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001
S: disk/by-path/pci-0000:00:1f.0-nvme-1
E: DEVLINKS=/dev/disk/by-path/pci-0000:00:1f.0-nvme-1 /dev/disk/by-id/nvme-nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol089d83b3dc8eae915
E: DEVNAME=/dev/nvme1n1
E: DEVPATH=/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
E: DEVTYPE=disk
E: ID_MODEL=Amazon Elastic Block Store
E: ID_PATH=pci-0000:00:1f.0-nvme-1
E: ID_PATH_TAG=pci-0000_00_1f_0-nvme-1
E: ID_SERIAL=Amazon Elastic Block Store_vol089d83b3dc8eae915
E: ID_SERIAL_SHORT=vol089d83b3dc8eae915
E: ID_WWN=nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001
E: ID_WWN_WITH_EXTENSION=nvme.1d0f-766f6c3038396438336233646338656165393135-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001
E: MAJOR=259
E: MINOR=5
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=2743697481
Tested with local-storage-operator.4.11.0-202203071904
@hekumar can you help to double confirm this way is correct or not?
--- Additional comment from Hemant Kumar on 2022-03-10 10:59:33 UTC ---
Thanks for testing that. This seems correct.
--- Additional comment from Red Hat Bugzilla on 2022-05-05 07:47:22 UTC ---
remove performed by PnT Account Manager <pnt-expunge>
--- Additional comment from errata-xmlrpc on 2022-06-15 17:49:34 UTC ---
This bug has been added to advisory RHEA-2022:5069 by OpenShift Release Team Bot (ocp-build/buildvm.openshift.eng.bos.redhat.com)
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: OpenShift Container Platform 4.9.45 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2022:5879