Bug 1744385 - One device can be created to multiple PVs by local storage provisioner [NEEDINFO]
Summary: One device can be created to multiple PVs by local storage provisioner
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.6.0
Assignee: Rohan CJ
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks: 1874048
TreeView+ depends on / blocked
 
Reported: 2019-08-22 03:37 UTC by Qin Ping
Modified: 2020-10-27 15:54 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Validation was not implemented for this case. Consequence: Multiple PVs could reference the same device unknowingly. Fix: Add validation. Result: Trying to create a PV on a block device where one is already provisioned by local-storage-operator will fail and result in an event.
Clone Of:
Environment:
Last Closed: 2020-10-27 15:54:19 UTC
Target Upstream Version:
assingh: needinfo? (hekumar)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:54:52 UTC

Internal Links: 1784530

Description Qin Ping 2019-08-22 03:37:51 UTC
Description of problem:
One device can be created to multiple PVs by local storage provisioner

Version-Release number of selected component (if applicable):
quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.2.0-201908181300-ose-local-storage-operator
quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.2.0-201908181300-ose-local-storage-static-provisioner
quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.2.0-201908181300-ose-local-storage-diskmaker

How reproducible:
100%

Steps to Reproduce:
1. Upload the local storage operator to operatorhub as a custom operator
2. Install local storage operator from web console
3. Create localvolume instance multiple times use the same device.
$ oc get localvolume
NAME            AGE
local-block     15m
local-disks     3h26m
local-disks-1   9m19s
$ oc get localvolume -ojson| jq .items[].spec.storageClassDevices
[
  {
    "devicePaths": [
      "/dev/vdb",
      "/dev/vdc"
    ],
    "storageClassName": "local-block-sc",
    "volumeMode": "Block"
  }
]
[
  {
    "devicePaths": [
      "/dev/vdb",
      "/dev/vdc"
    ],
    "fsType": "xfs",
    "storageClassName": "local-sc",
    "volumeMode": "Filesystem"
  }
]
[
  {
    "devicePaths": [
      "/dev/vdb",
      "/dev/vdc"
    ],
    "fsType": "ext4",
    "storageClassName": "local-sc-1",
    "volumeMode": "Filesystem"
  }
]
4. Check PVs


Actual results:
$ oc get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM            STORAGECLASS     REASON   AGE
local-pv-10fa3f93   110Gi      RWO            Delete           Available                    local-block-sc            17m
local-pv-1ea4d1b    110Gi      RWO            Delete           Available                    local-block-sc            17m
local-pv-2aa4b470   110Gi      RWO            Delete           Available                    local-sc-1                11m
local-pv-2d25b407   100Gi      RWO            Delete           Available                    local-sc-1                11m
local-pv-3bfdc58a   110Gi      RWO            Delete           Available                    local-block-sc            17m
local-pv-3d6983a9   100Gi      RWO            Delete           Available                    local-block-sc            17m
local-pv-468480ce   98Gi       RWO            Delete           Available                    local-sc                  3h28m
local-pv-678864da   110Gi      RWO            Delete           Available                    local-sc-1                11m
local-pv-7f0e3ea3   110Gi      RWO            Delete           Available                    local-sc                  3h28m
local-pv-80479914   110Gi      RWO            Delete           Available                    local-block-sc            17m
local-pv-a8c3a161   110Gi      RWO            Delete           Available                    local-sc-1                11m
local-pv-aa19783b   110Gi      RWO            Delete           Available                    local-sc                  3h28m
local-pv-ad05de22   98Gi       RWO            Delete           Available                    local-block-sc            17m
local-pv-b33f0678   110Gi      RWO            Delete           Available                    local-sc                  3h28m
local-pv-bf5b6b69   100Gi      RWO            Delete           Bound       storage/ebsc20   local-sc                  3h28m
local-pv-cd8a569    110Gi      RWO            Delete           Available                    local-sc-1                11m
local-pv-d48b22b8   98Gi       RWO            Delete           Available                    local-sc-1                11m
local-pv-f0b116     110Gi      RWO            Delete           Available                    local-sc                  3h28m

Expected results:
Only one PV is created for one device. If we permit multiple PVs are created for one device and have different volumeMode or fsType, it will make the data corruption on the device.

Additional info:
When a new localvolume instance is created, new local-diskmaker and local-provisioner ds are created
$ oc get ds
NAME                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
local-block-local-diskmaker       3         3         3       3            3           <none>          25m
local-block-local-provisioner     3         3         3       3            3           <none>          25m
local-disks-1-local-diskmaker     3         3         3       3            3           <none>          19m
local-disks-1-local-provisioner   3         3         3       3            3           <none>          19m
local-disks-local-diskmaker       3         3         3       3            3           <none>          3h35m
local-disks-local-provisioner     3         3         3       3            3           <none>          3h35m

Comment 7 Rohan CJ 2020-08-17 06:34:39 UTC
Fixed via https://github.com/openshift/local-storage-operator/pull/110

Not moving to ON_QA because I don't know if it's made it into builds yet.

@Christian could you clue me on how to check that?

Comment 11 Chao Yang 2020-08-26 07:23:31 UTC
Verification is failed with local-storage-operator.4.6.0-202008250930.p0

oc get localvolume -ojson| jq .items[].spec.storageClassDevices
[
  {
    "devicePaths": [
      "/dev/nvme2n1"
    ],
    "fsType": "ext4",
    "storageClassName": "local-storage-sc",
    "volumeMode": "Filesystem"
  },
  {
    "devicePaths": [
      "/dev/nvme2n1"
    ],
    "fsType": "xfs",
    "storageClassName": "local-storage-sc1",
    "volumeMode": "Filesystem"
  }
]


oc get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS        REASON   AGE
local-pv-73be811a   1Gi        RWO            Delete           Available           local-storage-sc1            3s
local-pv-85c5f5a2   1Gi        RWO            Delete           Available           local-storage-sc             13s

Comment 12 Christian Huffman 2020-08-26 19:49:31 UTC
@Rohan,

I'm assigning this one to you, since you've been working in this area. Let me know if you have any issues or questions with this.

Comment 13 Rohan CJ 2020-08-27 05:21:38 UTC
ack, investigating

Comment 15 Rohan CJ 2020-09-02 17:30:32 UTC
Little blocked RCAing this:

not getting /dev/disk/by-id symlinks on devices attached to OCP 4.5.6 nodes on AWS. Filing a bug for that: https://bugzilla.redhat.com/show_bug.cgi?id=1874987

Comment 18 Chao Yang 2020-09-03 08:14:51 UTC
I tried on 4.6.0-0.nightly-2020-09-02-210353 
This time, only one pv is provisioned for multi localvolume.
oc get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS       REASON   AGE
local-pv-87238f2e   1Gi        RWO            Delete           Available           local-storage-sc            2m16s

Comment 20 Chao Yang 2020-09-04 06:10:14 UTC
Yes. Update the bz status

Comment 21 Ashish Singh 2020-09-22 11:17:59 UTC
Hi Hemant/Rohan,

Do we have any BZ for 4.4.z fix?

Regards,
Ashish Singh

Comment 22 Rohan CJ 2020-10-12 12:00:35 UTC
I don'think there's a BZ, but the fix is merged: https://github.com/openshift/local-storage-operator/pull/158

Comment 24 errata-xmlrpc 2020-10-27 15:54:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.