Bug 2061995 - Node reboot causes duplicate persistent volumes
Summary: Node reboot causes duplicate persistent volumes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.10
Hardware: s390x
OS: Linux
high
high
Target Milestone: ---
: 4.10.z
Assignee: Hemant Kumar
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On: 2052618
Blocks: 2009709 2056590 2105453
TreeView+ depends on / blocked
 
Reported: 2022-03-08 21:12 UTC by Hemant Kumar
Modified: 2022-07-11 20:47 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2052618
Environment:
Last Closed: 2022-03-28 12:03:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift local-storage-operator pull 333 0 None Merged Bug 2061995: Use device location provided by the user 2022-10-12 13:00:20 UTC
Red Hat Product Errata RHBA-2022:1026 0 None None None 2022-03-28 12:04:12 UTC

Description Hemant Kumar 2022-03-08 21:12:43 UTC
+++ This bug was initially created as a clone of Bug #2052618 +++

Description of problem:
After creating a localvolume referencing devices at /dev/disk/by-path/ccw-0.0.0004 , rebooting nodes causes duplicate pvs to be created if device switched locations in kernel assigned device names (/dev/vd*). The generated pv paths reference these locations, not by-path. 

Version-Release number of selected component (if applicable):
Multiple versions, the latest being
"ocp 4.10.0-rc.1" and lso version "4.10.0-202202071841
How reproducible:
Anytime the real device changes path.

Steps to Reproduce:
1. Create local volume referencing disks by /dev/disk/by-path
2. Reboot nodes, perhaps multiple times to force /dev/vd* path change

Actual results:
`oc get pv` shows more pvs than actual devices present


Expected results:
When referenced disk path, changing /dev/vd* locations should not affect the persistent volumes.

localvolume:
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-disks"
  namespace: "openshift-local-storage"
spec:
  nodeSelector:
    nodeSelectorTerms:
      - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
              - worker-0.pok-93.ocptest.pok.stglabs.ibm.com
              - worker-1.pok-93.ocptest.pok.stglabs.ibm.com
  storageClassDevices:
    - storageClassName: "lso-fs"
      volumeMode: Filesystem
      fsType: ext4
      devicePaths:
        - /dev/disk/by-path/ccw-0.0.0004

PV Dump:
```
apiVersion: v1
items:
- apiVersion: v1
  kind: PersistentVolume
  metadata:
    annotations:
      pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker-0.pok-93.ocptest.pok.stglabs.ibm.com-e4947a81-cf42-48d7-9819-ed77c5759955
      storage.openshift.com/device-name: vdc
    creationTimestamp: "2022-02-09T16:15:10Z"
    finalizers:
    - kubernetes.io/pv-protection
    labels:
      kubernetes.io/hostname: worker-0.pok-93.ocptest.pok.stglabs.ibm.com
      storage.openshift.com/local-volume-owner-name: local-disks
      storage.openshift.com/local-volume-owner-namespace: openshift-local-storage
      storage.openshift.com/owner-kind: LocalVolume
      storage.openshift.com/owner-name: local-disks
      storage.openshift.com/owner-namespace: openshift-local-storage
    name: local-pv-7694dd7d
    ownerReferences:
    - apiVersion: v1
      kind: Node
      name: worker-0.pok-93.ocptest.pok.stglabs.ibm.com
      uid: e4947a81-cf42-48d7-9819-ed77c5759955
    resourceVersion: "978317"
    uid: 274b1b63-3040-4653-97e2-bb29d9dacac2
  spec:
    accessModes:
    - ReadWriteOnce
    capacity:
      storage: 20Gi
    local:
      fsType: ext4
      path: /mnt/local-storage/lso-fs/vdc
    nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-0.pok-93.ocptest.pok.stglabs.ibm.com
    persistentVolumeReclaimPolicy: Delete
    storageClassName: lso-fs
    volumeMode: Filesystem
  status:
    phase: Available
- apiVersion: v1
  kind: PersistentVolume
  metadata:
    annotations:
      pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker-0.pok-93.ocptest.pok.stglabs.ibm.com-e4947a81-cf42-48d7-9819-ed77c5759955
      storage.openshift.com/device-name: vdb
    creationTimestamp: "2022-02-09T16:08:51Z"
    finalizers:
    - kubernetes.io/pv-protection
    labels:
      kubernetes.io/hostname: worker-0.pok-93.ocptest.pok.stglabs.ibm.com
      storage.openshift.com/local-volume-owner-name: local-disks
      storage.openshift.com/local-volume-owner-namespace: openshift-local-storage
      storage.openshift.com/owner-kind: LocalVolume
      storage.openshift.com/owner-name: local-disks
      storage.openshift.com/owner-namespace: openshift-local-storage
    name: local-pv-cfd12a48
    ownerReferences:
    - apiVersion: v1
      kind: Node
      name: worker-0.pok-93.ocptest.pok.stglabs.ibm.com
      uid: e4947a81-cf42-48d7-9819-ed77c5759955
    resourceVersion: "975044"
    uid: 5139320d-81de-4e5f-8e64-a48f80e0be98
  spec:
    accessModes:
    - ReadWriteOnce
    capacity:
      storage: 20Gi
    local:
      fsType: ext4
      path: /mnt/local-storage/lso-fs/vdb
    nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-0.pok-93.ocptest.pok.stglabs.ibm.com
    persistentVolumeReclaimPolicy: Delete
    storageClassName: lso-fs
    volumeMode: Filesystem
  status:
    phase: Available
- apiVersion: v1
  kind: PersistentVolume
  metadata:
    annotations:
      pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker-1.pok-93.ocptest.pok.stglabs.ibm.com-a29a3b5b-cb66-4439-b107-ce1eb637eb17
      storage.openshift.com/device-name: vdb
    creationTimestamp: "2022-02-09T16:08:45Z"
    finalizers:
    - kubernetes.io/pv-protection
    labels:
      kubernetes.io/hostname: worker-1.pok-93.ocptest.pok.stglabs.ibm.com
      storage.openshift.com/local-volume-owner-name: local-disks
      storage.openshift.com/local-volume-owner-namespace: openshift-local-storage
      storage.openshift.com/owner-kind: LocalVolume
      storage.openshift.com/owner-name: local-disks
      storage.openshift.com/owner-namespace: openshift-local-storage
    name: local-pv-f8753489
    ownerReferences:
    - apiVersion: v1
      kind: Node
      name: worker-1.pok-93.ocptest.pok.stglabs.ibm.com
      uid: a29a3b5b-cb66-4439-b107-ce1eb637eb17
    resourceVersion: "975004"
    uid: 6f8e3d97-17ba-4f68-bdfc-c76f63a2dd99
  spec:
    accessModes:
    - ReadWriteOnce
    capacity:
      storage: 20Gi
    local:
      fsType: ext4
      path: /mnt/local-storage/lso-fs/vdb
    nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-1.pok-93.ocptest.pok.stglabs.ibm.com
    persistentVolumeReclaimPolicy: Delete
    storageClassName: lso-fs
    volumeMode: Filesystem
  status:
    phase: Available
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
```

Additional info:
persistent volume paths show /vd* while localvolume path shows /dev/disk/by-path/ccw-0.0.0004

```
oc get pv -o yaml | grep path
      path: /mnt/local-storage/lso-fs/vdc
      path: /mnt/local-storage/lso-fs/vdb
      path: /mnt/local-storage/lso-fs/vdb
```
oc get localvolume -o yaml | grep path
        {"apiVersion":"local.storage.openshift.io/v1","kind":"LocalVolume","metadata":{"annotations":{},"name":"local-disks","namespace":"openshift-local-storage"},"spec":{"nodeSelector":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"kubernetes.io/hostname","operator":"In","values":["worker-0.pok-93.ocptest.pok.stglabs.ibm.com","worker-1.pok-93.ocptest.pok.stglabs.ibm.com"]}]}]},"storageClassDevices":[{"devicePaths":["/dev/disk/by-path/ccw-0.0.0004"],"fsType":"ext4","storageClassName":"lso-fs","volumeMode":"Filesystem"}]}}
      - /dev/disk/by-path/ccw-0.0.0004

--- Additional comment from Hemant Kumar on 2022-02-09 17:17:12 UTC ---

LSO prefers paths in `/dev/disk/by-id` rather than `/dev/disk/by-path`. There are historical reasons for that and although in general I agree that - it might be suitable to fall back to `/dev/disk/by-path` if `/dev/disk/by-id` is not suitable, but we are not currently not doing that.


So, can you please try to reproduce this issue by using `/dev/disk/by-id` ? We can still fix this issue but if it is possible to use `by-id` path, it will at least unblock you.

--- Additional comment from Tom Dale on 2022-02-09 17:55:59 UTC ---

In the environment I've been using (zKVM with qcow2 disk) my disk is only mapped to /dev/disk/by-path, not by-id. I'll continue to look for other solutions.
```
udevadm info /dev/vdb
P: /devices/css0/0.0.0002/0.0.0004/virtio2/block/vdb
N: vdb
S: disk/by-path/ccw-0.0.0004
E: DEVLINKS=/dev/disk/by-path/ccw-0.0.0004
E: DEVNAME=/dev/vdb
E: DEVPATH=/devices/css0/0.0.0002/0.0.0004/virtio2/block/vdb
E: DEVTYPE=disk
E: ID_PATH=ccw-0.0.0004
E: ID_PATH_TAG=ccw-0_0_0004
E: MAJOR=252
E: MINOR=16
E: SUBSYSTEM=block
```

--- Additional comment from Tom Dale on 2022-02-11 16:17:32 UTC ---

I tested on zVM clusters that do have device links in /dev/disk/by-path and these do work as expected with lso.

Whereas on zKVM using qcow or virtual device passthrough, both are not giving by-id links. I've also tried using /dev/disk/by-partuuid for localVolume devicePath, but the same problem as when using by-path occurs.

While we wait for a fix could we add documentation that only "by-id" is supported? Currently in https://docs.openshift.com/container-platform/4.9/storage/persistent_storage/persistent-storage-local.html#local-volume-cr_persistent-storage-local the docs say "local disks filepath to the LocalVolume resource, such as /dev/disk/by-id/wwn" . Can we add a note that only "by-id" works? Should I create a separate bugzilla against the documentation?

--- Additional comment from Jan Safranek on 2022-03-01 15:22:29 UTC ---

LSO should use device name from /dev/disk/by-id, if it's available. If not, then the device name from LocalVolume CR should be used and not /dev/sdX.

--- Additional comment from Jan Safranek on 2022-03-04 10:19:51 UTC ---



--- Additional comment from OpenShift Automated Release Tooling on 2022-03-07 19:50:25 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.
This bug is expected to ship in the next 4.11 release created.

Comment 2 Chao Yang 2022-03-19 11:58:16 UTC
oc get pv local-pv-c623e4c4 -o json | jq .spec
{
  "accessModes": [
    "ReadWriteOnce"
  ],
  "capacity": {
    "storage": "1Gi"
  },
  "local": {
    "fsType": "ext4",
    "path": "/mnt/local-storage/foobar/pci-0000:00:1f.0-nvme-1"
  },
  "nodeAffinity": {
    "required": {
      "nodeSelectorTerms": [
        {
          "matchExpressions": [
            {
              "key": "kubernetes.io/hostname",
              "operator": "In",
              "values": [
                "ip-10-0-152-80.us-east-2.compute.internal"
              ]
            }
          ]
        }
      ]
    }
  },
  "persistentVolumeReclaimPolicy": "Delete",
  "storageClassName": "foobar",
  "volumeMode": "Filesystem"
}

oc get csv
local-storage-operator.4.10.0-202203160637   Local Storage                      4.10.0-202203160637              Succeeded

Comment 5 errata-xmlrpc 2022-03-28 12:03:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1026


Note You need to log in before you can comment on or make changes to this bug.