Bug 1914900
Summary: | [CNV][Chaos] VM is not migrated to other node if kill the original virt-laucncher pod immediately | ||
---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Guohua Ouyang <gouyang> |
Component: | Virtualization | Assignee: | sgott |
Status: | CLOSED NOTABUG | QA Contact: | Israel Pinto <ipinto> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 2.6.0 | CC: | cnv-qe-bugs, gouyang, pkliczew |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-01-13 13:10:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1908661 |
Description
Guohua Ouyang
2021-01-11 12:59:47 UTC
I don't believe this is a valid test scenario. Killing the original pod should result in what you described. Can you please explain why you think it's a bug that a VM was restarted when you explicitly killed it? (In reply to sgott from comment #1) > I don't believe this is a valid test scenario. Killing the original pod > should result in what you described. Can you please explain why you think > it's a bug that a VM was restarted when you explicitly killed it? It was adding as a disruptive scenario for CNV chaos testing [1]. As we can see it can be succeeful if just wait for a few more seconds before killing the pod, so it looks it depends on the state of the 2nd pod. I'm not sure whether this is a valid disruptive test scenario. @pkliczew, can you jump in here? [1] https://issues.redhat.com/browse/CNV-8366 @Stu, we work on chaos scenarios. We come up with different ways to break to cluster to see how resilient the code is to handle rare issues. If we can assume that virt launcher pod will always be there than it is not a bug but I am afraid it can be terminated/evicted for many reasons. The idea is to make sure that user understands what happened and the code handles the situation gracefully. Completely understood about the scenario. As far as I can see, this is behaving exactly as expected. If you kill the source pod, a migration simply cannot happen. The VM will then be re-started or not based on the runStrategy of the VM. In other words, the existence of a migration object is not a guarantee that a migration can happen. I am going to close this as not a bug. If you are sure I'm missing something, please re-open it. Guohua, do you see any inconsistencies like migration object being in progress or any other failures. I agree with Stu if we fail gracefully it is not a bug. The thing can make the results different is when to kill the original pod: If kill the 1st pod after the 2nd pod is getting into running state, the migration can be done $ virtctl migrate vm1 VM vm1 was scheduled to migrate $ oc get pod NAME READY STATUS RESTARTS AGE virt-launcher-vm1-4nc7v 1/1 Running 0 4m19s virt-launcher-vm1-shjmz 0/1 ContainerCreating 0 16s $ oc get pod NAME READY STATUS RESTARTS AGE virt-launcher-vm1-4nc7v 1/1 Running 0 4m24s virt-launcher-vm1-shjmz 1/1 Running 0 21s $ oc delete pod virt-launcher-vm1-4nc7v pod "virt-launcher-vm1-4nc7v" deleted $ oc get pod NAME READY STATUS RESTARTS AGE virt-launcher-vm1-shjmz 1/1 Running 0 107s If kill the 1st pod immediately, the migration cannot be done. $ virtctl migrate vm1 VM vm1 was scheduled to migrate $ oc get pod NAME READY STATUS RESTARTS AGE virt-launcher-vm1-96vfx 0/1 ContainerCreating 0 4s virt-launcher-vm1-shjmz 1/1 Running 0 2m $ oc delete pod virt-launcher-vm1-shjmz pod "virt-launcher-vm1-shjmz" deleted $ oc get pod NAME READY STATUS RESTARTS AGE virt-launcher-vm1-bh5gh 1/1 Running 0 15s Guohua, in second case what is the status of the migration. Is there any information why migration failed? (In reply to Piotr Kliczewski from comment #7) > Guohua, in second case what is the status of the migration. Is there any > information why migration failed? It's failed because the pod bring up by migration is killed along with killing the original pod. A new pod is always coming up, but not relevant to migration any more. |