Bug 1946886
Summary: | VM cloned from a VM(HPP) is stucking at starting | ||
---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Guohua Ouyang <gouyang> |
Component: | Storage | Assignee: | Alexander Wels <awels> |
Status: | CLOSED NOTABUG | QA Contact: | Ying Cui <ycui> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.8.0 | CC: | aos-bugs, awels, cnv-qe-bugs, gouyang, yzamir |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1940296 | Environment: | |
Last Closed: | 2021-04-08 11:53:14 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1940296 | ||
Bug Blocks: |
Description
Guohua Ouyang
2021-04-07 07:02:27 UTC
Clone the bug to storage for review. This problem can be also reproduced via steps: 1. add source to a template (HPP) 2. create a vm from the template 3. delete source from the template 4. start the VM VM is stucking at "Starting". Alexander can you please take a look? I saw you responding in the linked kubevirt-dev thread. Do we need some more information to see if there is an issue to fix here? So hpp storage class is what is called WaitForFirstConsumer, and we recently introduced code into CDI and KubeVirt that respects WFFC until the VM runs, so what is happening is fully expected and not a problem. The following is happening in what you are describing: 1. You create a DV, but the storage is WFFC, so the DV does NOT populate the PVC until a VM runs (that way we populate the PVC on the node the VM is scheduled to run on) 2. The PVC is not bound because it is waiting for the first consumer before binding, this also means the DV is not in succeeded phases. 3. Trying to clone a DV that is not in succeeded state will cause it to not clone at all, because cloning incomplete DV makes no sense. 4. Trying to start from the cloned DV (which is also not in succeeded phase) will not work because the VM detects the DV is not succeeded and will not start. This works as you did it in OCS because OCS has immediate binding mode because it is shared storage. There are 2 ways to make this work like it did before. 1. After creating the initial DV, start a VM which will cause it to get scheduled on a node which in turn triggers CDI to populate the DV, then stop the VM and do the rest as before. Because the DV is now succeeded, the clone will also succeed (once you start the second VM) again the WFFC is applied, so the clone won't actually happen until the VM that uses the cloned DV is started (for the same reason, we don't want to clone until we know where the data is going) 2. Add cdi.kubevirt.io/storage.bind.immediate.requested annotation on the DV, this will cause CDI to behave like before and just put the data on a random node (which might not be schedule-able for a VM btw) and bind the PVC immediately. For the second: > This should be a storage issue, report it in console kubevirt for a review firstly because the flow is quite normal. > 1. created a HPP VM, it could run well. > 2. after sometime cloned a new VM from it > 3. try to start both VMs You cannot clone a disk in a running VM, the VM will actively be modifying it, you have to shut down the VM first. Also the target PVC will have events on it indicating that the source is in use. Also note this is completely unrelated to BZ#1924728 because the importer/cloner pod(s) are not even started, so there are no longs to point to. The only thing we could do that we are not right now is communicate the reason we are not doing something in the condition of the DV. To recap, all of this is completely expected and correct behavior due to the nature of the hostpath storage being WaitForFirstConsumer. |