Bug 1929175
| Summary: | LocalVolumeSet: PV is created on disk belonging to other provisioner | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Rohan CJ <rojoseph> |
| Component: | Storage | Assignee: | Rohan CJ <rojoseph> |
| Storage sub component: | Local Storage Operator | QA Contact: | Chao Yang <chaoyang> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | aos-bugs, ocs-bugs, prsurve |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Some busy disks were detected as free
Consequence: LSO would claim disks belonging to other provisioners
Fix: Check disks for bind-mounts
Result: LSO no longer claims those disks
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 22:44:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1932816 | ||
| Bug Blocks: | 1931765 | ||
RCA is the same as https://github.com/openshift/local-storage-operator/pull/147/files bind-mounts are not checked for, so if the PV doesn't make the device busy, LSO doesn't detect it. Working on a fix. One addition to the "How To Reproduce" section 1: Use a pod that is not running any IO on the disk: I used a fedora image running `sleep infinity` Blocker because it can possibly cause writes to disks that already have important data. Needs backport Hello @rojoseph, Could you show me detailed steps?
1.Create pvc/pod and pv is provisioned for storageclass sc-test.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: sc-test
parameters:
type: gp2
fsType: ext4
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
mountOptions:
- "discard"
pvc:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc3
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: 'sc-test'
Pod:
apiVersion: v1
kind: Pod
metadata:
name: ubuntu1
spec:
containers:
- name: ubuntu
image: ubuntu:latest
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep infinity; done;" ]
volumeMounts:
- mountPath: "/tmp1"
name: aws1
volumes:
- name: aws1
persistentVolumeClaim:
claimName: pvc3
2.Check volumes on the nodes:
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 120G 0 disk
|-nvme0n1p1 259:1 0 1M 0 part
|-nvme0n1p2 259:2 0 127M 0 part
|-nvme0n1p3 259:3 0 384M 0 part /host/boot
`-nvme0n1p4 259:4 0 119.5G 0 part /host/sysroot
nvme1n1 259:5 0 1G 0 disk /host/var/lib/kubelet/pods/9a84cfed-b5b2-4063-a55d-da24f8a35ffb/volumes
Then what should I do next so I can make the above disk nvme1n1 the same as your disk nvme2n1?
Use a block mode storageclass and pvc. Another note: please verify with a disk larger than 1G because I there was a PR that set the default minSize to 1G (which is slightly less than 1Gi). The PR I mentioned: https://github.com/openshift/local-storage-operator/pull/198 lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 10G 0 loop
nvme0n1 259:0 0 120G 0 disk
|-nvme0n1p1 259:1 0 1M 0 part
|-nvme0n1p2 259:2 0 127M 0 part
|-nvme0n1p3 259:3 0 384M 0 part /boot
`-nvme0n1p4 259:4 0 119.5G 0 part /sysroot
nvme1n1 259:5 0 10G 0 disk
oc get pv
spec:
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
volumeID: aws://us-east-2b/vol-0741cbaf887738a4f
oc get localvolumediscoveryresult discovery-result-ip-10-0-67-71.us-east-2.compute.internal -o yaml
- deviceID: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0741cbaf887738a4f
fstype: ""
model: 'Amazon Elastic Block Store '
path: /dev/nvme1n1
property: NonRotational
serial: vol0741cbaf887738a4f
size: 10737418240
status:
state: NotAvailable
type: disk
vendor: ""
{"level":"info","ts":1615193494.5317512,"logger":"localvolumeset-symlink-controller","msg":"filter negative","Request.Namespace":"openshift-local-storage","Request.Name":"lvs","Device.Name":"nvme1n1","filter.Name":"noBindMounts"}
oc get localvolumeset
NAME STORAGECLASS PROVISIONED AGE
lvs lvs 0 25m
Passed with local-storage-operator.4.8.0-202103060050.p0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
Description of problem: Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. Provision PVC from other dynamic storageclass (like gp2 on amazon) 2. Create a broad LocalVolumeSet with no nodeSelector or inclusionSpec to select all free devices and nodes Actual results: PV provisioned for gp2 based PV's disk (and sometimes OCS based ones) Expected results: The disk should be ignored as it has bind-mounts Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: $ oc debug node/ip-10-0-148-139.us-east-2.compute.internal Starting pod/ip-10-0-148-139us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.148.139 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 512G 0 loop nvme0n1 259:0 0 120G 0 disk |-nvme0n1p1 259:1 0 384M 0 part /boot |-nvme0n1p2 259:2 0 127M 0 part /boot/efi |-nvme0n1p3 259:3 0 1M 0 part `-nvme0n1p4 259:4 0 119.5G 0 part `-coreos-luks-root-nocrypt 253:0 0 119.5G 0 dm /sysroot nvme1n1 259:5 0 10G 0 disk /var/lib/kubelet/pods/625a85a7-6507-4909-b2f6-b2de726fa9bb/volumes/kubernetes.io~aws-ebs/pvc-36d8ccb7-adac-4a1a-99b5-baf0eab65d5c nvme2n1 259:6 0 512G 0 disk $ oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-406f496f 512Gi RWO Delete Available fs 69m nvme2n1 is a gp2 disk, but used by the PV below: spec: accessModes: - ReadWriteOnce capacity: storage: 512Gi local: path: /mnt/local-storage/fs/nvme-Amazon_Elastic_Block_Store_vol051148859c554e07c