Description of problem: Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. Provision PVC from other dynamic storageclass (like gp2 on amazon) 2. Create a broad LocalVolumeSet with no nodeSelector or inclusionSpec to select all free devices and nodes Actual results: PV provisioned for gp2 based PV's disk (and sometimes OCS based ones) Expected results: The disk should be ignored as it has bind-mounts Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: $ oc debug node/ip-10-0-148-139.us-east-2.compute.internal Starting pod/ip-10-0-148-139us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.148.139 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 512G 0 loop nvme0n1 259:0 0 120G 0 disk |-nvme0n1p1 259:1 0 384M 0 part /boot |-nvme0n1p2 259:2 0 127M 0 part /boot/efi |-nvme0n1p3 259:3 0 1M 0 part `-nvme0n1p4 259:4 0 119.5G 0 part `-coreos-luks-root-nocrypt 253:0 0 119.5G 0 dm /sysroot nvme1n1 259:5 0 10G 0 disk /var/lib/kubelet/pods/625a85a7-6507-4909-b2f6-b2de726fa9bb/volumes/kubernetes.io~aws-ebs/pvc-36d8ccb7-adac-4a1a-99b5-baf0eab65d5c nvme2n1 259:6 0 512G 0 disk $ oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-406f496f 512Gi RWO Delete Available fs 69m nvme2n1 is a gp2 disk, but used by the PV below: spec: accessModes: - ReadWriteOnce capacity: storage: 512Gi local: path: /mnt/local-storage/fs/nvme-Amazon_Elastic_Block_Store_vol051148859c554e07c
RCA is the same as https://github.com/openshift/local-storage-operator/pull/147/files bind-mounts are not checked for, so if the PV doesn't make the device busy, LSO doesn't detect it. Working on a fix.
One addition to the "How To Reproduce" section 1: Use a pod that is not running any IO on the disk: I used a fedora image running `sleep infinity`
Blocker because it can possibly cause writes to disks that already have important data. Needs backport
Hello @rojoseph, Could you show me detailed steps? 1.Create pvc/pod and pv is provisioned for storageclass sc-test. apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: sc-test parameters: type: gp2 fsType: ext4 provisioner: kubernetes.io/aws-ebs reclaimPolicy: Retain volumeBindingMode: WaitForFirstConsumer mountOptions: - "discard" pvc: apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc3 spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: 'sc-test' Pod: apiVersion: v1 kind: Pod metadata: name: ubuntu1 spec: containers: - name: ubuntu image: ubuntu:latest command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep infinity; done;" ] volumeMounts: - mountPath: "/tmp1" name: aws1 volumes: - name: aws1 persistentVolumeClaim: claimName: pvc3 2.Check volumes on the nodes: lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 120G 0 disk |-nvme0n1p1 259:1 0 1M 0 part |-nvme0n1p2 259:2 0 127M 0 part |-nvme0n1p3 259:3 0 384M 0 part /host/boot `-nvme0n1p4 259:4 0 119.5G 0 part /host/sysroot nvme1n1 259:5 0 1G 0 disk /host/var/lib/kubelet/pods/9a84cfed-b5b2-4063-a55d-da24f8a35ffb/volumes Then what should I do next so I can make the above disk nvme1n1 the same as your disk nvme2n1?
Use a block mode storageclass and pvc. Another note: please verify with a disk larger than 1G because I there was a PR that set the default minSize to 1G (which is slightly less than 1Gi). The PR I mentioned: https://github.com/openshift/local-storage-operator/pull/198
lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 10G 0 loop nvme0n1 259:0 0 120G 0 disk |-nvme0n1p1 259:1 0 1M 0 part |-nvme0n1p2 259:2 0 127M 0 part |-nvme0n1p3 259:3 0 384M 0 part /boot `-nvme0n1p4 259:4 0 119.5G 0 part /sysroot nvme1n1 259:5 0 10G 0 disk oc get pv spec: accessModes: - ReadWriteOnce awsElasticBlockStore: volumeID: aws://us-east-2b/vol-0741cbaf887738a4f oc get localvolumediscoveryresult discovery-result-ip-10-0-67-71.us-east-2.compute.internal -o yaml - deviceID: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0741cbaf887738a4f fstype: "" model: 'Amazon Elastic Block Store ' path: /dev/nvme1n1 property: NonRotational serial: vol0741cbaf887738a4f size: 10737418240 status: state: NotAvailable type: disk vendor: "" {"level":"info","ts":1615193494.5317512,"logger":"localvolumeset-symlink-controller","msg":"filter negative","Request.Namespace":"openshift-local-storage","Request.Name":"lvs","Device.Name":"nvme1n1","filter.Name":"noBindMounts"} oc get localvolumeset NAME STORAGECLASS PROVISIONED AGE lvs lvs 0 25m Passed with local-storage-operator.4.8.0-202103060050.p0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438