Bug 1917380
| Summary: | virt-handler removed from node when node label changed if workload placement specified | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | aschuett <aschuett> |
| Component: | Virtualization | Assignee: | aschuett <aschuett> |
| Status: | CLOSED ERRATA | QA Contact: | Kedar Bidarkar <kbidarka> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8.0 | CC: | cnv-qe-bugs, fdeutsch, kbidarka, rmohr, sgott |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | hco-bundle-registry-container-v4.8.0-347 virt-operator-container-v4.8.0-58 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 14:22:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
aschuett
2021-01-18 11:48:02 UTC
add event to alert human operator: https://github.com/kubevirt/kubevirt/pull/4952 *** Bug 1931803 has been marked as a duplicate of this bug. *** To reproduce: follow steps in description: 1) start some VMs 2) add label to workload nodes 3) specify hco.spec.workloads.nodeplacement.nodeselector to match the label in step 1 4) remove label from one node (with VMs on it) 5) observe that virt-handler is still running on that node after attempted label change. note that in addition to the alert it will in the future also be possible to change the spec.workload selectors withouth shutting down VMs (https://github.com/kubevirt/kubevirt/pull/5221), to actually resolve issues reported by the alert. Summary: virt-handler is NO LONGER running on the node after attempted label change.
[kbidarka@localhost secureboot]$ oc get pods -n openshift-cnv | grep virt-handler
virt-handler-5cw4k 1/1 Running 0 4d23h
virt-handler-fjb2t 1/1 Running 0 4d23h
virt-handler-kcnw9 1/1 Running 0 4d23h
[kbidarka@localhost secureboot]$ oc get nodes
NAME STATUS ROLES AGE VERSION
node02.redhat.com Ready master 5d v1.21.0-rc.0+c656d63
node03.redhat.com Ready master 5d v1.21.0-rc.0+c656d63
node04.redhat.com Ready master 5d v1.21.0-rc.0+c656d63
node05.redhat.com Ready worker 5d v1.21.0-rc.0+c656d63
node06.redhat.com Ready worker 5d v1.21.0-rc.0+c656d63
node07.redhat.com Ready worker 5d v1.21.0-rc.0+c656d63
oc label node node05.redhat.com workload-comp=gpu-workload
oc label node node06.redhat.com workload-comp=gpu-workload
oc label node node07.redhat.com workload-comp=gpu-workload
---
[kbidarka@localhost secureboot]$ oc get node node06.redhat.com -o yaml | grep workload
workload-comp: gpu-workload
[kbidarka@localhost secureboot]$ oc get node node05.redhat.com -o yaml | grep workload
workload-comp: gpu-workload
[kbidarka@localhost secureboot]$ oc get node node07.redhat.com -o yaml | grep workload
workload-comp: gpu-workload
[kbidarka@localhost secureboot]$ oc get vmi
NAME AGE PHASE IP NODENAME
vm-rhel84 24s Running 70.xxx.2.236 node06.redhat.com
vm2-rhel84 23s Running 70.xxx.2.227 node07.redhat.com
vm2-rhel84-secref 22s Running 70.xxx.2.169 node05.redhat.com
---
[kbidarka@localhost secureboot]$ oc get pods -n openshift-cnv | grep virt-handler
virt-handler-c4sv4 1/1 Running 0 5m26s
virt-handler-qsfpx 1/1 Running 0 4m34s
virt-handler-wct2d 1/1 Running 0 67s
[kbidarka@localhost secureboot]$ oc label node node07.redhat.com workload-comp-
node/node07.redhat.com labeled
[kbidarka@localhost secureboot]$ oc get node node07.redhat.com -o yaml | grep workload
[kbidarka@localhost secureboot]$ oc get vmi
NAME AGE PHASE IP NODENAME
vm-rhel84 4m2s Running 70.xxx.2.236 node06.redhat.com
vm2-rhel84 4m1s Running 70.xxx.2.227 node07.redhat.com
vm2-rhel84-secref 4m Running 70.xxx.2.169 node05.redhat.com
[kbidarka@localhost secureboot]$ oc get pods -n openshift-cnv | grep virt-handler
virt-handler-c4sv4 1/1 Running 0 21m
virt-handler-qsfpx 1/1 Running 0 21m
@Ashley: As the virt-handler is "no longer" running, after removal of the label as seen above. ^
1)Can you please suggest, What is the expected behaviour here?
2) Looking at the linked PR's in the bug, "Should we only expect to see an Alert in the UI" notifying the Admin that there is an 'orphaned vmi' ?
3) Assuming, that we would still continue to see "No virt-handler running" for the Node, for which the label is removed. Need Confirmation.
---
NOTE:
1) Tried looking at the UI, for the Alerts but see the below message:
AlertmanagerReceiversNotConfigured:
"Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. "
2) Will try to configure the "AlertmanagerReceivers".
We do see an alert fired in prometheus after 1 hr. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920 |