Description of problem: happened to find that sometime after VM restart, the vm is scheduled to another node for some reason, then we tried to trigger the memory dump again, the hp-volume pod keeps in pending status since the memory and can not do a new memory dump since the previous is not finished. Version-Release number of selected component (if applicable): CNV-v4.12.0-450 How reproducible: Sometimes Steps to Reproduce: 1. Create a VM 2. Do memory dump $ virtctl memory-dump get vm-fedora-datavolume --claim-name=memoryvolume --create-claim 3. Restart the VM - sometimes the vm is scheduled to another node 4. Do memory dump again $ virtctl memory-dump get vm-fedora-datavolume Actual results: $ oc get pod -n default NAME READY STATUS RESTARTS AGE hp-volume-w4nz8 0/1 Pending 0 19h virt-launcher-vm-fedora-datavolume-qjzxj 1/1 Running 0 19h Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 118m (x2088 over 19h) default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Warning FailedScheduling 14m default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Warning FailedScheduling 14m default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Warning FailedScheduling 9m52s default-scheduler 0/6 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 1 node(s) were unschedulable, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Warning FailedScheduling 8m28s default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Expected results: maybe we could have a friendly warning about why the memory dump failed at this situation? Additional info:
Shelly, it would be nice if we could somehow detect and handle this situation automatically rather then requiring a manual workaround. I realize this is difficult because we don't want to leak storage details into kubevirt. Maybe some sort of timeout and then we will discard the old PVC?
Regarding the warning, the error seems to happen in the hotplug phase of the pvc for the memory dump, Yan, did you look at the volume status in the vmi to see if anything is shown there? dealing with such case is more relevant to the hotplug. But regardless, currently the memory dump command just triggers the memory dump process and exits, do we want it to wait until it completes? and if not completed in the defined period of time we want to return error and disassociate the pvc? and if we also created it in the process also delete it?
The vmi volume status is as below: $ oc get vmi -o yaml apiVersion: v1 items: - apiVersion: kubevirt.io/v1 kind: VirtualMachineInstance metadata: annotations: kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1alpha3 creationTimestamp: "2022-09-14T09:38:42Z" finalizers: - kubevirt.io/virtualMachineControllerFinalize - foregroundDeleteVirtualMachine generation: 13 labels: kubevirt.io/nodeName: c01-yadu412-kjc7h-worker-0-n26rk kubevirt.io/vm: vm-datavolume name: vm-fedora-datavolume namespace: default ownerReferences: - apiVersion: kubevirt.io/v1 blockOwnerDeletion: true controller: true kind: VirtualMachine name: vm-fedora-datavolume uid: a692cd87-03a5-4cbb-b414-73877f5f9528 resourceVersion: "240123" uid: 2150b759-0a27-4ec2-8bb8-a4d248e6023b spec: domain: cpu: cores: 1 model: host-model sockets: 1 threads: 1 devices: disks: - disk: bus: virtio name: datavolumevolume interfaces: - masquerade: {} name: default features: acpi: enabled: true firmware: uuid: e69d93b8-45ca-5bd6-b02e-bf134bb338de machine: type: pc-q35-rhel8.6.0 resources: requests: memory: 1024M networks: - name: default pod: {} terminationGracePeriodSeconds: 0 volumes: - dataVolume: name: fedora-dv name: datavolumevolume - memoryDump: claimName: pvc1 hotpluggable: true name: pvc1 status: activePods: 9557121e-81c7-466d-8293-c7114cb1e791: c01-yadu412-kjc7h-worker-0-n26rk conditions: - lastProbeTime: null lastTransitionTime: "2022-09-14T09:38:54Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: null message: 'cannot migrate VMI: PVC fedora-dv is not shared, live migration requires that all PVCs must be shared (using ReadWriteMany access mode)' reason: DisksNotLiveMigratable status: "False" type: LiveMigratable - lastProbeTime: "2022-09-14T09:39:11Z" lastTransitionTime: null status: "True" type: AgentConnected guestOSInfo: id: fedora kernelRelease: 5.12.11-300.fc34.x86_64 kernelVersion: '#1 SMP Wed Jun 16 15:47:58 UTC 2021' name: Fedora prettyName: Fedora 34 (Cloud Edition) version: "34" versionId: "34" interfaces: - infoSource: domain, guest-agent interfaceName: eth0 ipAddress: 10.128.2.44 ipAddresses: - 10.128.2.44 mac: 52:54:00:82:3d:b6 name: default queueCount: 1 launcherContainerImageVersion: registry.redhat.io/container-native-virtualization/virt-launcher@sha256:35bdecc535e077fe19ec3fcdfc4e30d895acd806f330c9cb8435c1e1b0da7c00 migrationMethod: BlockMigration migrationTransport: Unix nodeName: c01-yadu412-kjc7h-worker-0-n26rk phase: Running phaseTransitionTimestamps: - phase: Pending phaseTransitionTimestamp: "2022-09-14T09:38:42Z" - phase: Scheduling phaseTransitionTimestamp: "2022-09-14T09:38:43Z" - phase: Scheduled phaseTransitionTimestamp: "2022-09-14T09:38:54Z" - phase: Running phaseTransitionTimestamp: "2022-09-14T09:38:57Z" qosClass: Burstable runtimeUser: 107 virtualMachineRevisionName: revision-start-vm-a692cd87-03a5-4cbb-b414-73877f5f9528-2 volumeStatus: - name: datavolumevolume persistentVolumeClaimInfo: accessModes: - ReadWriteOnce capacity: storage: 10Gi filesystemOverhead: "0.055" requests: storage: 10Gi volumeMode: Filesystem target: vda - hotplugVolume: attachPodName: hp-volume-j69qt memoryDumpVolume: claimName: pvc1 message: Created hotplug attachment pod hp-volume-j69qt, for volume pvc1 name: pvc1 persistentVolumeClaimInfo: accessModes: - ReadWriteOnce capacity: storage: 149Gi filesystemOverhead: "0.055" requests: storage: "1191182336" volumeMode: Filesystem phase: AttachedToNode reason: SuccessfulCreate target: "" kind: List metadata: resourceVersion: "" $ oc describe vmi ----------8<-------------------- Volume Status: Name: datavolumevolume Persistent Volume Claim Info: Access Modes: ReadWriteOnce Capacity: Storage: 10Gi Filesystem Overhead: 0.055 Requests: Storage: 10Gi Volume Mode: Filesystem Target: vda Hotplug Volume: Attach Pod Name: hp-volume-j69qt Memory Dump Volume: Claim Name: pvc1 Message: Created hotplug attachment pod hp-volume-j69qt, for volume pvc1 Name: pvc1 Persistent Volume Claim Info: Access Modes: ReadWriteOnce Capacity: Storage: 149Gi Filesystem Overhead: 0.055 Requests: Storage: 1191182336 Volume Mode: Filesystem Phase: AttachedToNode Reason: SuccessfulCreate Target: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 9m18s virtualmachine-controller Created virtual machine pod virt-launcher-vm-fedora-datavolume-wwh6r Normal Created 9m3s virt-handler VirtualMachineInstance defined. Normal Started 9m3s virt-handler VirtualMachineInstance started. Normal SuccessfulCreate 8m54s virtualmachine-controller Created attachment pod hp-volume-j69qt Normal SuccessfulCreate 8m49s (x5 over 8m54s) virtualmachine-controller Created hotplug attachment pod hp-volume-j69qt, for volume pvc1
OK I see, so the hotplug doesnt show there is any issue. In that term need to look into it. Regarding the memory dump behavior waiting for @alitke response
Summarizing grooming discussion: This will happen regardless of memory dump/not when main disk is not topology constrained but some hotplugged volume is, for example, ceph for main disk and hpp for hotplugged disk. Might make sense to have an extra bug for this Maybe we want to set hotplug volume status to failed when we detect such a situation as a short-term fix, but we should still decide if the underlying issue here is hotplug-related, or should we focus on more friendly virtctl memory-dump interaction? @alitke
This is definitely a generic hotplug issue but in the specific case of memory dump I think we have an opportunity to improve the user experience. When a user wants to trigger a new memory dump and we are in this situation (a dump PVC that cannot be attached), we can simply remove the old PVC and create a new one. This is safe because the user already told us that they want to replace the existing memory dump with a new one. I do think we will also encounter a similar error with VM export and we need to look into how to handle it.
I think in order to do that we need to at least make the hotplug process fail or show some error cause otherwise I don't think we can know that the PVC cannot be attached, I don't think putting a timeout on that is right. In case of identifying such error I think it will be possible to delete current PVC and create a new one.