Description of problem: If local storage is the default storage in the cluster, following the Wizard to create VM via web console, the import progress will fail. The generated VM yaml doesn't specify storageClassName: local storage(here is hdd) on PVC spec so PVC and couldn't bind. Version-Release number of selected component (if applicable): cnv-libvirt-container-v1.4.0-6.1556622302 How reproducible: 100% Steps to Reproduce: Set local storage to default. Follow the Wizard to create a VM from web console. Actual results: Pod status Importer Error "0/3 nodes are available: 1 node(s) didn't match node selector, 2 node(s) didn't find available persistent volumes to bind. ======================================================================== PVC status PVC is pending with message "waiting for first consumer to be created before binding" ======================================================================== PV status [root@cnv-executor-shanks-master1 ~]# oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-17957d4f 6521Mi RWO Delete Available hdd 22h local-pv-25e8c722 6521Mi RWO Delete Available hdd 22h local-pv-3c0d2778 6521Mi RWO Delete Available hdd 14h local-pv-4327fee4 6521Mi RWO Delete Available hdd 22h local-pv-6017277d 6521Mi RWO Delete Available hdd 15h local-pv-72584191 6521Mi RWO Delete Available hdd 22h local-pv-7b1a6494 6521Mi RWO Delete Bound shanks/upload-test hdd 1h local-pv-aa631362 6521Mi RWO Delete Bound shanks/cirros-dv hdd 1h local-pv-c71f73db 6521Mi RWO Delete Available hdd 22h ========================================================================= Which storage be used [root@cnv-executor-shanks-master1 ~]# oc get sc NAME PROVISIONER AGE glusterfs-storage kubernetes.io/glusterfs 23h hdd (default) kubernetes.io/no-provisioner 22h ========================================================================= apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: name.os.template.cnv.io/rhel7.6: Red Hat Enterprise Linux 7.6 selfLink: /apis/kubevirt.io/v1alpha3/namespaces/test/virtualmachines/qwang-test resourceVersion: '289061' name: qwang-test uid: af77cb5a-7220-11e9-b335-fa163eb3676b creationTimestamp: '2019-05-09T06:07:11Z' generation: 1 namespace: test labels: app: qwang-test flavor.template.cnv.io/small: 'true' os.template.cnv.io/rhel7.6: 'true' template.cnv.ui: openshift_rhel7-generic-small vm.cnv.io/template: rhel7-generic-small workload.template.cnv.io/generic: 'true' spec: dataVolumeTemplates: - metadata: name: rootdisk-qwang-test spec: pvc: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi source: http: url: >- https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img running: true template: metadata: labels: vm.cnv.io/name: qwang-test spec: domain: cpu: cores: 1 sockets: 1 threads: 1 devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk interfaces: - bridge: {} name: nic0 rng: {} resources: requests: memory: 2G networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - dataVolume: name: rootdisk-qwang-test name: rootdisk Expected results: PVC consume PV. The import progress should complete. VM can run. Users do kind of "one-button" VM creation on UI, they don't care about which storage is used behind. Additional info:
If storageClassName is not specified on the PVC spec, default storage should be consumed. I'm not sure why this default local storage has the problem.
I believe the UI uses the API correctly and the issue is in the underlying storage. @Adam, can you please have a look at the provided yaml if it is the correct way how to express that we want to use the default storage class? And if yes, why is it not working? Thank you!
Moving to storage for handling there. Please feel free to reassign to UI if you believe it is not using the API correctly.
This seems like a configuration problem with your cluster. Are all of your nodes marked as schedulable to run VMs? Do you have local volumes available on all nodes?
(In reply to Adam Litke from comment #4) > Are all of your nodes marked as schedulable to run VMs? Yes. Nodes are schedulable [root@cnv-executor-qwang-master1 ~]# oc get node -o yaml | grep schedulable kubevirt.io/schedulable: "true" kubevirt.io/schedulable: "true" > Do you have local volumes available on all nodes? Yes. vdc 253:32 0 20G 0 disk |-vg_local_storage-lv_local1 252:5 0 6.6G 0 lvm /mnt/local-storage/hdd/disk1 |-vg_local_storage-lv_local2 252:6 0 6.6G 0 lvm /mnt/local-storage/hdd/disk2 `-vg_local_storage-lv_local3 252:7 0 6.6G 0 lvm /mnt/local-storage/hdd/disk3
Qixuan, Thanks for providing information. From the info it seems 2/3 nodes are schedulable. Does each node have three PVs configured? I'm concerned that the situation is that the only node with available PVs remaining happens to be unschedulable. Please also show me the result of 'oc describe pvc rootdisk-qwang-test'
[root@cnv-executor-qwang-master1 ~]# oc describe pvc rootdisk-qwang-vm-cirros Name: rootdisk-qwang-vm-cirros Namespace: bug StorageClass: hdd Status: Pending Volume: Labels: app=containerized-data-importer cdi-controller=rootdisk-qwang-vm-cirros Annotations: cdi.kubevirt.io/storage.contentType=kubevirt cdi.kubevirt.io/storage.import.endpoint=https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img cdi.kubevirt.io/storage.import.importPodName=importer-rootdisk-qwang-vm-cirros-pt52n cdi.kubevirt.io/storage.import.source=http cdi.kubevirt.io/storage.pod.phase=Pending Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal WaitForFirstConsumer 12s (x15 over 3m) persistentvolume-controller waiting for first consumer to be created before binding
@Qixuan, maybe it would be faster if you could provide me an environment where this is happening for you and I can take a look.
I can reproduce it with CNV 1.4 async build. I didn't see this problem with CNV 2.0.
Hi Qixuan, I looked at your environment and your PVC is requesting 10Gi of storage but your PVs are only 6521Mi. Please retry the steps with a smaller VM disk size or provide larger PVs.
Thanks, Adam. I'm not sure why create a VM from UI wizard, the rootdisk needs 10Gi storage? 10Gi is a result of tradeoff?
We need to look at more informative events on PVCs to quickly indicate such a simple problem (not any PVs with the requested size available).
As per Comment 14, the problem is that the PVC events are lacking in detail making it hard to figure out what the root cause of the issue is. Changing title accordingly and moving to openshift/storage.
The pod has events "2 node(s) didn't find available persistent volumes to bind", maybe we can send those to PVC too (Michelle approves :-)
This bug has been fixed upstream with https://github.com/kubernetes/kubernetes/pull/91455 PVC events should look like this: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal WaitForFirstConsumer 20s (x6 over 87s) persistentvolume-controller waiting for first consumer to be created before binding Normal WaitForPodScheduled 5s persistentvolume-controller waiting for pod pod-0 to be scheduled Which should give user a hint that something may be wrong with pod-0 scheduling. Waiting for 1.19 rebase to land.
Waiting for 1.19 rebase to land.
Rebase (rc2) has landed, please check if it's OK.
Verified on 4.6.0-0.nightly-2020-08-16-072105 1.oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-4b27b3af 1Gi RWO Delete Available lvs-file 22h 2.Create pvc with requests storage is 2Gi 3.Create pod 4.oc describe pvc Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal WaitForFirstConsumer 36s (x2 over 43s) persistentvolume-controller waiting for first consumer to be created before binding Normal WaitForPodScheduled 6s (x2 over 21s) persistentvolume-controller waiting for pod pod1 to be scheduled 5.oc describe pod Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> 0/6 nodes are available: 3 node(s) didn't find available persistent volumes to bind, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
We won't release any 4.1 update.