Bug 2240963
| Summary: | VM shows DataVolumeError during DV provisioning | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | vsibirsk |
| Component: | Storage | Assignee: | Álvaro Romero <alromero> |
| Status: | CLOSED MIGRATED | QA Contact: | Harel Meir <hmeir> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.14.0 | CC: | akalenyu, dafrank, hmeir, jpeimer, ycui |
| Target Milestone: | --- | ||
| Target Release: | 4.14.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-12-14 16:12:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
vsibirsk
2023-09-27 11:15:52 UTC
Looks like we are getting the 'DataVolumeError' status when DV gets restarted: $ oc get vm -w NAME AGE STATUS READY vm-cirros-source-ocs 0s vm-cirros-source-ocs 0s vm-cirros-source-ocs 0s vm-cirros-source-ocs 0s vm-cirros-source-ocs 0s Provisioning False vm-cirros-source-ocs 0s Provisioning False vm-cirros-source-ocs 15s DataVolumeError False vm-cirros-source-ocs 16s Provisioning False vm-cirros-source-ocs 17s DataVolumeError False vm-cirros-source-ocs 21s Provisioning False vm-cirros-source-ocs 34s WaitingForVolumeBinding False vm-cirros-source-ocs 34s Starting False vm-cirros-source-ocs 34s Starting False vm-cirros-source-ocs 34s Starting False vm-cirros-source-ocs 44s Starting False vm-cirros-source-ocs 47s Running False vm-cirros-source-ocs 47s Running True $ oc get dv -w NAME PHASE PROGRESS RESTARTS AGE cirros-dv-source-ocs 0s cirros-dv-source-ocs 0s cirros-dv-source-ocs ImportScheduled N/A 0s cirros-dv-source-ocs ImportScheduled N/A 10s cirros-dv-source-ocs ImportInProgress N/A 13s cirros-dv-source-ocs ImportInProgress N/A 15s cirros-dv-source-ocs ImportInProgress N/A 1 16s cirros-dv-source-ocs ImportInProgress N/A 1 17s cirros-dv-source-ocs ImportScheduled N/A 1 17s cirros-dv-source-ocs ImportScheduled N/A 1 21s cirros-dv-source-ocs ImportInProgress N/A 1 31s cirros-dv-source-ocs ImportInProgress N/A 1 33s cirros-dv-source-ocs Succeeded 100.0% 1 34s cirros-dv-source-ocs Succeeded 100.0% 1 34s (In reply to Jenia Peimer from comment #2) > Looks like we are getting the 'DataVolumeError' status when DV gets > restarted: > > $ oc get vm -w > NAME AGE STATUS READY > vm-cirros-source-ocs 0s > vm-cirros-source-ocs 0s > vm-cirros-source-ocs 0s > vm-cirros-source-ocs 0s > vm-cirros-source-ocs 0s Provisioning False > vm-cirros-source-ocs 0s Provisioning False > vm-cirros-source-ocs 15s DataVolumeError False > vm-cirros-source-ocs 16s Provisioning False > vm-cirros-source-ocs 17s DataVolumeError False > vm-cirros-source-ocs 21s Provisioning False > vm-cirros-source-ocs 34s WaitingForVolumeBinding False > vm-cirros-source-ocs 34s Starting False > vm-cirros-source-ocs 34s Starting False > vm-cirros-source-ocs 34s Starting False > vm-cirros-source-ocs 44s Starting False > vm-cirros-source-ocs 47s Running False > vm-cirros-source-ocs 47s Running True > > $ oc get dv -w > NAME PHASE PROGRESS RESTARTS AGE > cirros-dv-source-ocs 0s > cirros-dv-source-ocs 0s > cirros-dv-source-ocs ImportScheduled N/A 0s > cirros-dv-source-ocs ImportScheduled N/A 10s > cirros-dv-source-ocs ImportInProgress N/A 13s > cirros-dv-source-ocs ImportInProgress N/A 15s > cirros-dv-source-ocs ImportInProgress N/A 1 16s > cirros-dv-source-ocs ImportInProgress N/A 1 17s > cirros-dv-source-ocs ImportScheduled N/A 1 17s > cirros-dv-source-ocs ImportScheduled N/A 1 21s > cirros-dv-source-ocs ImportInProgress N/A 1 31s > cirros-dv-source-ocs ImportInProgress N/A 1 33s > cirros-dv-source-ocs Succeeded 100.0% 1 34s > cirros-dv-source-ocs Succeeded 100.0% 1 34s Following up with a summary; This may or may not do with the scratch space DataVolume restart, but we should definitely look into when kubevirt derives "DataVolumeError" since it doesn't look like it should have been fired in this case. still happening in v4.14.1.rhel9-56: NAME PHASE PROGRESS RESTARTS AGE datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A 13s datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A 18s datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A 20s datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A 21s datavolume.cdi.kubevirt.io/cirros-dv ImportInProgress N/A 27s datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A 1 32s NAME AGE STATUS READY virtualmachine.kubevirt.io/vm-cirros-datavolume 13s Provisioning False virtualmachine.kubevirt.io/vm-cirros-datavolume 18s Provisioning False virtualmachine.kubevirt.io/vm-cirros-datavolume 20s Provisioning False virtualmachine.kubevirt.io/vm-cirros-datavolume 21s Provisioning False virtualmachine.kubevirt.io/vm-cirros-datavolume 27s Provisioning False virtualmachine.kubevirt.io/vm-cirros-datavolume 32s DataVolumeError False (In reply to Harel Meir from comment #4) > still happening in v4.14.1.rhel9-56: > > > NAME PHASE PROGRESS RESTARTS > AGE > datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A > 13s > datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A > 18s > datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A > 20s > datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A > 21s > datavolume.cdi.kubevirt.io/cirros-dv ImportInProgress N/A > 27s > datavolume.cdi.kubevirt.io/cirros-dv ImportScheduled N/A 1 > 32s > > > > > > NAME AGE STATUS READY > virtualmachine.kubevirt.io/vm-cirros-datavolume 13s Provisioning False > virtualmachine.kubevirt.io/vm-cirros-datavolume 18s Provisioning False > virtualmachine.kubevirt.io/vm-cirros-datavolume 20s Provisioning False > virtualmachine.kubevirt.io/vm-cirros-datavolume 21s Provisioning False > virtualmachine.kubevirt.io/vm-cirros-datavolume 27s Provisioning False > virtualmachine.kubevirt.io/vm-cirros-datavolume 32s DataVolumeError > False Hey Harel, could you check the DV conditions at the time of the DV Error? And maybe share the VM manifest you are using? I'm not able to replicate it anymore with the fix, another condition other than ScratchSpaceRequired may be triggering this behavior. Sure
This is the template:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-cirros-source-ocs
labels:
kubevirt.io/vm: vm-cirros-source-ocs
spec:
dataVolumeTemplates:
- metadata:
name: cirros-dv-source-ocs
spec:
storage:
resources:
requests:
storage: 1Gi
storageClassName: ocs-storagecluster-ceph-rbd-virtualization
source:
http:
url: http://cnv-qe-server.cnv-qe.rhood.us/files/cnv-tests/cirros-images/cirros-0.4.0-x86_64-disk.qcow2
running: true
template:
metadata:
labels:
kubevirt.io/vm: vm-cirros-source-ocs
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: datavolumev-ocs
machine:
type: ""
resources:
requests:
memory: 100M
terminationGracePeriodSeconds: 0
volumes:
- dataVolume:
name: cirros-dv-source-ocs
name: datavolumev-ocs
This is the VM when the error occures:
- apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","metadata":{"annotations":{},"labels":{"kubevirt.io/vm":"vm-cirros-source-ocs"},"name":"vm-cirros-source-ocs","namespace":"default"},"spec":{"dataVolumeTemplates":[{"metadata":{"name":"cirros-dv-source-ocs"},"spec":{"source":{"http":{"url":"http://cnv-qe-server.cnv-qe.rhood.us/files/cnv-tests/cirros-images/cirros-0.4.0-x86_64-disk.qcow2"}},"storage":{"resources":{"requests":{"storage":"1Gi"}},"storageClassName":"ocs-storagecluster-ceph-rbd-virtualization"}}}],"running":true,"template":{"metadata":{"labels":{"kubevirt.io/vm":"vm-cirros-source-ocs"}},"spec":{"domain":{"devices":{"disks":[{"disk":{"bus":"virtio"},"name":"datavolumev-ocs"}]},"machine":{"type":""},"resources":{"requests":{"memory":"100M"}}},"terminationGracePeriodSeconds":0,"volumes":[{"dataVolume":{"name":"cirros-dv-source-ocs"},"name":"datavolumev-ocs"}]}}}}
kubemacpool.io/transaction-timestamp: "2023-11-21T13:34:52.420255043Z"
kubevirt.io/latest-observed-api-version: v1
kubevirt.io/storage-observed-api-version: v1
creationTimestamp: "2023-11-21T13:34:22Z"
finalizers:
- kubevirt.io/virtualMachineControllerFinalize
generation: 1
labels:
kubevirt.io/vm: vm-cirros-source-ocs
name: vm-cirros-source-ocs
namespace: default
resourceVersion: "1277490"
uid: 5d033b45-ad8c-462f-bfbb-6522db2e115d
spec:
dataVolumeTemplates:
- metadata:
creationTimestamp: null
name: cirros-dv-source-ocs
spec:
source:
http:
url: http://cnv-qe-server.cnv-qe.rhood.us/files/cnv-tests/cirros-images/cirros-0.4.0-x86_64-disk.qcow2
storage:
resources:
requests:
storage: 1Gi
storageClassName: ocs-storagecluster-ceph-rbd-virtualization
running: true
template:
metadata:
creationTimestamp: null
labels:
kubevirt.io/vm: vm-cirros-source-ocs
spec:
architecture: amd64
domain:
devices:
disks:
- disk:
bus: virtio
name: datavolumev-ocs
machine:
type: pc-q35-rhel9.2.0
resources:
requests:
memory: 100M
terminationGracePeriodSeconds: 0
volumes:
- dataVolume:
name: cirros-dv-source-ocs
name: datavolumev-ocs
status:
conditions:
- lastProbeTime: "2023-11-21T13:34:23Z"
lastTransitionTime: "2023-11-21T13:34:23Z"
message: VMI does not exist
reason: VMINotExists
status: "False"
type: Ready
printableStatus: DataVolumeError
volumeSnapshotStatuses:
- enabled: true
name: datavolumev-ocs
This is the DV:
- apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
annotations:
cdi.kubevirt.io/storage.usePopulator: "true"
creationTimestamp: "2023-11-21T13:34:23Z"
generation: 1
labels:
kubevirt.io/created-by: 5d033b45-ad8c-462f-bfbb-6522db2e115d
name: cirros-dv-source-ocs
namespace: default
ownerReferences:
- apiVersion: kubevirt.io/v1
blockOwnerDeletion: true
controller: true
kind: VirtualMachine
name: vm-cirros-source-ocs
uid: 5d033b45-ad8c-462f-bfbb-6522db2e115d
resourceVersion: "1277431"
uid: 2d605e9e-33a5-4e10-9b85-614ab91ea098
spec:
source:
http:
url: http://cnv-qe-server.cnv-qe.rhood.us/files/cnv-tests/cirros-images/cirros-0.4.0-x86_64-disk.qcow2
storage:
resources:
requests:
storage: 1Gi
storageClassName: ocs-storagecluster-ceph-rbd-virtualization
status:
claimName: cirros-dv-source-ocs
conditions:
- lastHeartbeatTime: "2023-11-21T13:34:23Z"
lastTransitionTime: "2023-11-21T13:34:23Z"
message: PVC cirros-dv-source-ocs Pending
reason: Pending
status: "False"
type: Bound
- lastHeartbeatTime: "2023-11-21T13:34:50Z"
lastTransitionTime: "2023-11-21T13:34:23Z"
status: "False"
type: Ready
- lastHeartbeatTime: "2023-11-21T13:34:50Z"
lastTransitionTime: "2023-11-21T13:34:50Z"
reason: Error
status: "False"
type: Running
phase: ImportScheduled
progress: N/A
restartCount: 1
This is the virt launcher pod:
- apiVersion: v1
kind: Pod
metadata:
annotations:
cdi.kubevirt.io/storage.createdByController: "yes"
sidecar.istio.io/inject: "false"
creationTimestamp: "2023-11-21T13:34:51Z"
labels:
app: containerized-data-importer
app.kubernetes.io/component: storage
app.kubernetes.io/managed-by: cdi-controller
app.kubernetes.io/part-of: hyperconverged-cluster
app.kubernetes.io/version: 4.14.1
cdi.kubevirt.io: importer
prometheus.cdi.kubevirt.io: "true"
name: importer-prime-9d56d3bb-8024-4130-acd7-167b481438c6
namespace: default
ownerReferences:
- apiVersion: v1
blockOwnerDeletion: true
controller: true
kind: PersistentVolumeClaim
name: prime-9d56d3bb-8024-4130-acd7-167b481438c6
uid: 2fe9da37-ac34-4eb0-85dc-fd7482784511
resourceVersion: "1277459"
uid: 7676617e-ba9d-4965-84c6-93e1b4fba94e
spec:
containers:
- args:
- -v=1
env:
- name: IMPORTER_SOURCE
value: http
- name: IMPORTER_ENDPOINT
value: http://cnv-qe-server.cnv-qe.rhood.us/files/cnv-tests/cirros-images/cirros-0.4.0-x86_64-disk.qcow2
- name: IMPORTER_CONTENTTYPE
value: kubevirt
- name: IMPORTER_IMAGE_SIZE
value: "1073741824"
- name: OWNER_UID
value: 9d56d3bb-8024-4130-acd7-167b481438c6
- name: FILESYSTEM_OVERHEAD
value: "0"
- name: INSECURE_TLS
value: "false"
- name: IMPORTER_DISK_ID
- name: IMPORTER_UUID
- name: IMPORTER_PULL_METHOD
- name: IMPORTER_READY_FILE
- name: IMPORTER_DONE_FILE
- name: IMPORTER_BACKING_FILE
- name: IMPORTER_THUMBPRINT
- name: http_proxy
- name: https_proxy
- name: no_proxy
- name: IMPORTER_CURRENT_CHECKPOINT
- name: IMPORTER_PREVIOUS_CHECKPOINT
- name: IMPORTER_FINAL_CHECKPOINT
- name: PREALLOCATION
value: "false"
image: registry.redhat.io/container-native-virtualization/virt-cdi-importer-rhel9@sha256:75a9f754acba4cc158ebac58b161b70f964802a4ce9915cb20db413854af2830
imagePullPolicy: IfNotPresent
name: importer
ports:
- containerPort: 8443
name: metrics
protocol: TCP
resources:
limits:
cpu: 750m
memory: 600M
requests:
cpu: 100m
memory: 60M
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 107
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeDevices:
- devicePath: /dev/cdi-block-volume
name: cdi-data-vol
volumeMounts:
- mountPath: /scratch
name: cdi-scratch-vol
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-n9z65
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: default-dockercfg-l8snt
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext:
fsGroup: 107
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
volumes:
- name: cdi-data-vol
persistentVolumeClaim:
claimName: prime-9d56d3bb-8024-4130-acd7-167b481438c6
- name: cdi-scratch-vol
persistentVolumeClaim:
claimName: prime-9d56d3bb-8024-4130-acd7-167b481438c6-scratch
- name: kube-api-access-n9z65
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
- configMap:
items:
- key: service-ca.crt
path: service-ca.crt
name: openshift-service-ca.crt
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-11-21T13:34:51Z"
message: '0/6 nodes are available: persistentvolumeclaim "prime-9d56d3bb-8024-4130-acd7-167b481438c6-scratch"
not found. preemption: 0/6 nodes are available: 6 Preemption is not helpful
for scheduling..'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable
Moving back to Assigned, as we still can reproduce the issue |