Bug 2213657
| Summary: | Warn admin if cluster is not set up for highavailability | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Sean Haselden <shaselde> |
| Component: | Infrastructure | Assignee: | Karel Šimon <ksimon> |
| Status: | NEW --- | QA Contact: | Geetika Kapoor <gkapoor> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.12.2 | CC: | fdeutsch, gkapoor, ycui |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Sean Haselden
2023-06-08 21:41:36 UTC
Case 1: VM doesn't support live migration, node is down , no mechanism to make node up exist. We see below alerts : VMCannotBeEvicted Eviction policy for fedora-dcl8kcbo31ttq3lf (on node c01-gkbug412-dbgc7-worker-0-qbr6s) is set to Live Migration but the VM is not migratable KubeNodeUnreachable c01-gkbug412-dbgc7-worker-0-qbr6s is unreachable and some workloads may be rescheduled. KubeNodeNotReady c01-gkbug412-dbgc7-worker-0-qbr6s has been unready for more than 15 minutes. Case 2: VM support live migration, node is down , no mechanism to make node up exist. set sc to ocs-storagecluster-ceph-rbd $ oc get pvc -A| grep t5dwa7ddhh02vall testing fedora-t5dwa7ddhh02vall Bound pvc-3d2f5c1d-d43d-42cf-a96b-a00af7a12a09 30Gi RWX ocs-storagecluster-ceph-rbd 16m $ oc get vmi -A NAMESPACE NAME AGE PHASE IP NODENAME READY testing fedora-t5dwa7ddhh02vall 7m4s Running 10.129.2.144 c01-gkbug412-dbgc7-worker-0-qbr6s True VM is running without VMEvictionalerts. Shutdown the node, VM goes to Ready=False $ oc get vmi -A NAMESPACE NAME AGE PHASE IP NODENAME READY testing fedora-t5dwa7ddhh02vall 9m4s Running 10.129.2.144 c01-gkbug412-dbgc7-worker-0-qbr6s False Monitor VM if it gets started at different node $ oc get vmi -A NAMESPACE NAME AGE PHASE IP NODENAME READY testing fedora-t5dwa7ddhh02vall 13m Running 10.129.2.144 c01-gkbug412-dbgc7-worker-0-qbr6s False Note: Its been 4 mins since VM is down, No alerts have been seen so far. This is a problem. Alerts from network side and storage side are getting populated as node is not up. so alert is missing in case VM can migrate but is down due to some reason . As a user i might want to see alert if my VM is not up. |