We are currently documenting in our documentation on how to enable live migration for a VM and how to handle normal node maintenance operation. However, it seems we lack information on what happen (or should happen) when a node in the cluster crash and go out completely without warning. What should happen to VM when the node goes out? Should they restart and in what delay? Also, is there any restriction for the VM availability (based on storage, etc).
This is a documentation bug, only. Known issue: When a node is restarted in UPI baremetal, the VM does not automatically restart on another node. Auto restart is supported in IPI only. See related jira CNV-7548
Hi Jean-Francois, thanks for bringing this to our attention. I own the similar Jira story CNV-7443, where this work is being tracked, so I have chowned this bug as well. In that story, I am documenting how a user can manually ensure that the VMI fails over when a node fails and MHC is not enabled. >>>> Also, is there any restriction for the VM availability (based on storage, etc). I do not know the answer to this but would like to document this type of conceptual information alongside the recovery steps, if possible. Fabian, what do you think?
(In reply to ctomasko from comment #4) > This is a documentation bug, only. Known issue: When a node is restarted in > UPI baremetal, the VM does not automatically restart on another node. Auto > restart is supported in IPI only. See related jira CNV-7548 Is there any plan to support this feature, not just document UPI not supported?
> Is there any plan to support this feature, not just document UPI not supported? I am afraid there is none. Supporting node recycling on UPI would be a major undertaking for OCP. Maybe Andrew Beekhof can point you to where you can ask for this in OpenShift.
(In reply to Dan Kenigsberg from comment #8) > > Is there any plan to support this feature, not just document UPI not supported? > > I am afraid there is none. Supporting node recycling on UPI would be a major > undertaking for OCP. Maybe Andrew Beekhof can point you to where you can ask > for this in OpenShift. Alberto Lamela the Cloud team TL would be a good place to start
(In reply to Andrew Beekhof from comment #9) > (In reply to Dan Kenigsberg from comment #8) > > > Is there any plan to support this feature, not just document UPI not supported? > > > > I am afraid there is none. Supporting node recycling on UPI would be a major > > undertaking for OCP. Maybe Andrew Beekhof can point you to where you can ask > > for this in OpenShift. > > Alberto Lamela the Cloud team TL would be a good place to start Thank you very much. I will try to reach out him.
I have a PR ready for review which attempts to cover this BZ and the related Jira issue CNV-7443. I have already requested via Jira that either Fabian or Stu review it: https://github.com/openshift/openshift-docs/pull/26963 @jsaucier, I would also appreciate your review for the sake of this BZ and the related customer cases. Thanks!
@pousley I did put my review in the PR, thanks!
As I stated on https://issues.redhat.com/browse/CNV-7443, I have merged the PR after applying peer review feedback. Thanks to Fabian and Jean-Francois for your reviews as well. Vasiliy Sibirskiy had approved the PR from a QE perspective; Israel, can we move this to Verified? Thanks!
View the published docs here: https://docs.openshift.com/container-platform/4.6/virt/virtual_machines/virt-triggering-vm-failover-resolving-failed-node.html Closing this bug.