Bug 1901335
| Summary: | [CNV][Chaos] Vm is not paused when connection to storage is lost | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Ondra Machacek <omachace> |
| Component: | Virtualization | Assignee: | lpivarc |
| Status: | CLOSED ERRATA | QA Contact: | Kedar Bidarkar <kbidarka> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 2.5.0 | CC: | aasserzo, cnv-qe-bugs, dfediuck, fdeutsch, kbidarka, omachace, pkliczew, sgott, ycui |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | virt-operator-container-v4.8.0-60 hco-bundle-registry-container-v4.8.0-375 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 14:21:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1908661, 1926746 | ||
|
Description
Ondra Machacek
2020-11-24 21:34:15 UTC
Ondra: Do you see it also with OCS? The storage is not relevant here. The point here is that in case of any I/O failure, the VM is not paused, because of the default libvirt error policy. I added an example with NFS for simple reproduction steps. With OCS it should be the same. Good catch. While at it, please consider the resume policy as well. For reference, see the available RHV options [1]. [1] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/virtual_machine_management_guide/index#Configuring_a_highly_available_virtual_machine Adam, we're treating this as a Virtualization bug, but just pinging you so you're aware. Feel free to move it to the Storage component if you think that is more appropriate. Hi, there was an addition of `error_policy=stop` recently in https://github.com/kubevirt/kubevirt/pull/4840. Could you confirm that the only piece we are missing is the propagation to status/conditions? The `resume` policy is specific to RHV and libvirt doesn't support it. Therefore I would like to ask for another feature request. (In reply to lpivarc from comment #6) > there was an addition of `error_policy=stop` recently in > https://github.com/kubevirt/kubevirt/pull/4840. Could you confirm that the > only piece we are missing is the propagation to status/conditions? > Yes, the conditions/status propagation is last missing piece to this bz, if it wasn't solved as part of that PR. To verify, follow steps to reproduce in description. Tested with a) virt-operator version 4.8.0-60 b) NFS Storage ( Configured my own NFS storage ) 1) With the below config [root@cnv-qe-01 pv101]# cat /etc/exports /data/nfs_shares/bm01-cnvqe-rdu2 *(rw,sync,no_wdelay,no_root_squash,insecure) [kbidarka@localhost ~]$ oc get vmi -o wide NAME AGE PHASE IP NODENAME LIVE-MIGRATABLE PAUSED vm2-rhel84 45m Running xx.yyy.d.s node-13.redhat.com True The VMI is running successfully. 2) By limiting the NFS export to only the NFS Server itself. [root@cnv-qe-01 pv101]# cat /etc/exports /data/nfs_shares/bm01-cnvqe-rdu2 localhost(rw,sync,no_wdelay,no_root_squash,insecure) (cnv-tests) [kbidarka@localhost ~]$ oc get vmi -o wide NAME AGE PHASE IP NODENAME LIVE-MIGRATABLE PAUSED vm2-rhel84 46m Running xx.yyy.d.s node-13.redhat.com True True The VMI enters PAUSED state automatically. ~]$ oc rsh virt-launcher-vm2-rhel84-pq9lh sh-4.4# virsh list Id Name State ----------------------------------- 1 default_vm2-rhel84 paused Moving this bug to VERIFIED state. See the below message:
Message: VMI was paused, IO error
Reason: PausedIOError
Status: True
Type: Paused
---
Volumes:
Data Volume:
Name: rhel84-dv2
Name: datavolumedisk1
Cloud Init No Cloud:
User Data: #cloud-config
password: redhat
chpasswd: { expire: False }
Name: cloudinitdisk
Status:
Active Pods:
ea34997a-968d-43a2-9ce8-0f0e5547247c: node-13.redhat.com
Conditions:
Last Probe Time: <nil>
Last Transition Time: <nil>
Status: True
Type: LiveMigratable
Last Probe Time: <nil>
Last Transition Time: 2021-06-09T16:39:00Z
Status: True
Type: Ready
Last Probe Time: 2021-06-09T16:39:09Z
Last Transition Time: <nil>
Status: True
Type: AgentConnected
Last Probe Time: 2021-06-09T17:25:23Z
Last Transition Time: 2021-06-09T17:25:23Z
Message: VMI was paused, IO error
Reason: PausedIOError
Status: True
Type: Paused
Guest OS Info:
Id: rhel
Kernel Release: 4.18.0-287.el8.dt4.x86_64
Kernel Version: #1 SMP Thu Feb 18 13:31:55 EST 2021
Name: Red Hat Enterprise Linux
Pretty Name: Red Hat Enterprise Linux 8.4 (Ootpa)
Version: 8.4
Version Id: 8.4
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920 |