Bug 1963275
Summary: | migration controller null pointer dereference | ||
---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | David Vossel <dvossel> |
Component: | Virtualization | Assignee: | David Vossel <dvossel> |
Status: | CLOSED ERRATA | QA Contact: | Israel Pinto <ipinto> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 2.5.6 | CC: | cnv-qe-bugs, fdeutsch, sgott, zpeng |
Target Milestone: | --- | ||
Target Release: | 2.6.6 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | virt-operator-container-v2.6.6-3 hco-bundle-registry-container-v2.6.6-24 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-08-10 17:33:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Vossel
2021-05-21 21:50:56 UTC
There is a PR posted upstream related to this https://github.com/kubevirt/kubevirt/pull/5694 David, Have you been able to actually reproduce this, or is the steps to reproduce in the description theoretical? > David,
>
> Have you been able to actually reproduce this, or is the steps to reproduce
> in the description theoretical?
I've never reproduced this.
The crash is caused by the target pod failing before the handoff to virt-handler can occur. Theoretically it can be reproduced by deleting the target pod immediately once it is posted to the cluster before our migration controller can perform the handoff, but attempting to trigger this will be a race between virt-controller and the pod deletion.
Just so everyone is aware, we know where the crash is occurring based on production logs that link the crash loop directly to a specific line in the code that tries to dereference the null pointer the POSTed prs address. So this isn't a blind fix.
verify with build: hco: 2.6.6-35 virt-operator-container-v2.6.6-5 step: 1. create vm and start 2. start migration 3. immediately delete the target pod as it appears 4. check virt-controller status in openshift-cnv no crash loop occurs. check migration is failed. vm still running on source node. Virt-controller in running status do live migration again, it works. test both linux and windows vm. move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.6 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3119 |