Bug 1317429 - [RFE] Improve HA failover, so that even when power fencing is not available, automatic HA will work without manual confirmation on host rebooted.
Summary: [RFE] Improve HA failover, so that even when power fencing is not available, ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: RFEs
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ovirt-4.1.0-beta
: ---
Assignee: Nir Soffer
QA Contact: Lilach Zitnitski
URL: http://www.ovirt.org/develop/release-...
Whiteboard:
Depends On: 1406765 1410320 1412230 1415488
Blocks: 804272 1421432
TreeView+ depends on / blocked
 
Reported: 2016-03-14 09:03 UTC by Yaniv Lavi
Modified: 2017-03-02 01:30 UTC (History)
30 users (show)

Fixed In Version:
Clone Of: 804272
Environment:
Last Closed: 2017-02-15 15:05:22 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: exception+
ylavi: priority_rfe_tracking+
gklein: testing_plan_complete+
ylavi: planning_ack+
amureini: devel_ack+
ratamir: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 65444 0 master MERGED xleases: Introduce the xlease module 2016-12-01 17:23:59 UTC
oVirt gerrit 65465 0 master MERGED vm: Support vm leases 2016-12-22 16:32:16 UTC
oVirt gerrit 66603 0 master ABANDONED xleases: Use utils.closing instead of reinventing it 2016-12-01 23:10:28 UTC
oVirt gerrit 66604 0 master ABANDONED xleases: Add logging 2016-12-01 23:10:41 UTC
oVirt gerrit 67347 0 master MERGED xleases: Add and remove sanlock resource 2016-12-05 09:33:11 UTC
oVirt gerrit 67349 0 master MERGED xleases: Use six.PY2 instead of reinventing it 2016-12-05 09:33:26 UTC
oVirt gerrit 67380 0 master MERGED xleases: Add failing tests for storage operations 2016-12-05 09:35:01 UTC
oVirt gerrit 67381 0 master MERGED xleases: Robust sanlock resources management 2016-12-05 13:14:07 UTC
oVirt gerrit 67609 0 master MERGED xleases: Update leases volume format 2016-12-07 06:27:05 UTC
oVirt gerrit 67610 0 master MERGED xleases: Make VolumeLeaseStatus more general 2016-12-14 12:32:15 UTC
oVirt gerrit 67611 0 master MERGED xleases: Introduce the Lease API's 2016-12-22 16:32:00 UTC
oVirt gerrit 67717 0 master MERGED xleases: Support SPM verbs without pool uuid 2016-12-08 22:09:07 UTC
oVirt gerrit 68046 0 master MERGED xleases: Move LeasesVolume.format to format_index 2016-12-10 15:43:32 UTC
oVirt gerrit 68047 0 master MERGED xleases: Read and write index metadata 2016-12-20 17:16:56 UTC
oVirt gerrit 68067 0 master MERGED xleases: Cleanup free records writing and lookup 2016-12-20 20:32:37 UTC
oVirt gerrit 68069 0 master MERGED xleases: Separate VolumeIndex loading from file 2016-12-20 20:53:59 UTC
oVirt gerrit 68072 0 master MERGED xleases: Prevent use of index during update 2016-12-20 21:12:48 UTC
oVirt gerrit 68075 0 master MERGED xleases: Create and activate the external leases volume 2016-12-22 16:16:08 UTC
oVirt gerrit 68085 0 master MERGED xleases: Implement basic leases APIs 2016-12-22 16:32:09 UTC
oVirt gerrit 68762 0 ovirt-engine-4.1 MERGED core: minor refactoring in handling of vds network exceptions 2016-12-20 20:01:51 UTC
oVirt gerrit 69020 0 ovirt-4.1 MERGED xleases: Create and activate the external leases volume 2016-12-27 07:34:25 UTC
oVirt gerrit 69021 0 ovirt-4.1 MERGED xleases: Introduce the Lease API's 2016-12-27 07:34:59 UTC
oVirt gerrit 69022 0 ovirt-4.1 MERGED xleases: Implement basic leases APIs 2016-12-27 07:35:22 UTC
oVirt gerrit 69023 0 ovirt-4.1 MERGED vm: Support vm leases 2016-12-27 07:35:44 UTC
oVirt gerrit 69187 0 master MERGED xleases: Add DirectFile.size() method 2017-01-05 17:52:08 UTC
oVirt gerrit 69188 0 master MERGED xleases: Add VolumeIndex.updating context manager 2017-01-05 17:52:04 UTC
oVirt gerrit 69189 0 master MERGED xleases: Implement rebuild_index 2018-12-03 23:37:33 UTC
oVirt gerrit 69190 0 master MERGED xleases: Wire the rebuild_leases API 2018-12-03 23:37:36 UTC
oVirt gerrit 69251 0 ovirt-engine-4.1 MERGED core: remove unneeded query when getting vms to move to unknown 2016-12-29 09:41:49 UTC
oVirt gerrit 69252 0 ovirt-engine-4.1 MERGED core: ability to run vms in unknown status 2016-12-29 09:41:26 UTC
oVirt gerrit 69253 0 ovirt-engine-4.1 MERGED core: add vm leases 2016-12-29 09:41:38 UTC
oVirt gerrit 69254 0 ovirt-engine-4.1 MERGED core: add and remove vds commands for vm leases 2016-12-29 09:41:17 UTC
oVirt gerrit 69255 0 ovirt-engine-4.1 MERGED core: remove vm lease on remove vm 2016-12-29 09:41:09 UTC
oVirt gerrit 69256 0 ovirt-engine-4.1 MERGED core: add vm lease on add/edit/import vm 2016-12-29 09:40:51 UTC
oVirt gerrit 69257 0 ovirt-engine-4.1 MERGED core: auto start vms with lease 2016-12-29 09:40:56 UTC
oVirt gerrit 69258 0 ovirt-engine-4.1 MERGED core: send vm lease on run vm 2016-12-29 09:41:02 UTC
oVirt gerrit 69259 0 ovirt-engine-4.1 MERGED webadmin: ability to set vm leases 2017-01-01 13:08:04 UTC
oVirt gerrit 69336 0 master MERGED pylint: storage/fileSD: fix typo 2017-01-01 07:48:56 UTC
oVirt gerrit 69343 0 master MERGED pylint: fix storage.sd.SDManifest 2017-01-02 13:12:16 UTC
oVirt gerrit 69349 0 master MERGED vm: Add the missing VmLeaseDevice type 2017-01-06 09:22:10 UTC
oVirt gerrit 69356 0 ovirt-4.1 MERGED pylint: storage/fileSD: fix typo 2017-02-01 15:11:53 UTC
oVirt gerrit 69394 0 master MERGED core: initial delay before automatic start of vm with lease 2017-01-05 08:53:15 UTC
oVirt gerrit 69397 0 master MERGED core: automatically start vms with lease by their priority 2017-01-12 09:21:40 UTC
oVirt gerrit 69414 0 master MERGED core: vm leases are supported since 4.1 2017-01-11 07:44:02 UTC
oVirt gerrit 69428 0 ovirt-4.1 MERGED pylint: fix storage.sd.SDManifest 2017-02-01 15:09:57 UTC
oVirt gerrit 69537 0 master MERGED core: refactoring in vm analyzer 2017-01-05 08:05:52 UTC
oVirt gerrit 69538 0 master MERGED core: refactoring vm analyzer 2017-01-05 16:18:18 UTC
oVirt gerrit 69539 0 master MERGED core: better support in the monitoring for auto start of vms with lease 2017-01-05 16:18:31 UTC
oVirt gerrit 69540 0 master MERGED core: increase the interval between retries to auto start vm with a lease 2017-01-05 16:18:57 UTC
oVirt gerrit 69678 0 ovirt-engine-4.1 MERGED core: refactoring in vm analyzer 2017-01-05 09:50:43 UTC
oVirt gerrit 69679 0 ovirt-engine-4.1 MERGED core: initial delay before automatic start of vm with lease 2017-01-05 09:51:29 UTC
oVirt gerrit 69680 0 ovirt-engine-4.1 MERGED core: vm leases are supported since 4.1 2017-01-15 14:07:42 UTC
oVirt gerrit 69681 0 ovirt-engine-4.1 MERGED core: refactoring vm analyzer 2017-01-15 14:07:29 UTC
oVirt gerrit 69682 0 ovirt-engine-4.1 MERGED core: automatically start vms with lease by their priority 2017-01-15 14:03:23 UTC
oVirt gerrit 69683 0 ovirt-engine-4.1 MERGED core: better support in the monitoring for auto start of vms with lease 2017-01-15 14:03:29 UTC
oVirt gerrit 69684 0 ovirt-engine-4.1 MERGED core: increase the interval between retries to auto start vm with a lease 2017-01-15 14:07:34 UTC
oVirt gerrit 69808 0 ovirt-engine-4.1 MERGED core: do not copy lease_sd_id from vm to template 2017-01-09 08:56:51 UTC
oVirt gerrit 69823 0 ovirt-engine-4.1 MERGED core: do not copy lease_sd_id from vm to template 2017-01-11 09:55:10 UTC
oVirt gerrit 70975 0 master MERGED spec: Update libvirt requirement on EL7 2017-01-23 23:13:07 UTC
oVirt gerrit 71062 0 ovirt-4.1 MERGED spec: Update libvirt requirement on EL7 2017-01-24 11:22:12 UTC
oVirt gerrit 71647 0 ovirt-4.1 MERGED vm: Add the missing VmLeaseDevice type 2017-02-08 16:33:13 UTC

Description Yaniv Lavi 2016-03-14 09:03:40 UTC
Improve HA failover, so that even when power fencing is not available, automatic HA will work without manual confirmation on host rebooted. We need to provide a way to restart VMs and move SPM role to a running server in case power fencing does fail.

Power Fencing failing can be due to various reasons:
1. PowerOutage leaves the iLO/Drac, whatever unreachable
2. Network outage also leads to Power Fencing not reachable
3. Strange system failures that also affects the power fencing device
4. Misconfiguration of e.g. Firewalls

All these should lead to VMs running on other hypervisors afterwards so
that they are reachable again. Therefore wwe need to make sure that the host running the VM previously has no chance of reaching the storage anymore and as such it can't do any harm to the data.

Comment 1 Allon Mureinik 2016-03-14 12:16:18 UTC
We need to finilize the design, marking that we haven't completed it yet. Once the design is finilized, we can properly devel ack/nack accordingly.

Comment 2 Sandro Bonazzola 2016-05-02 10:09:37 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 4 Nir Soffer 2016-11-23 12:25:28 UTC
Here is the storage-side feature page:
http://www.ovirt.org/develop/release-management/features/storage/vm-leases/

On top of this there is the virt-side feature page (in review):
https://github.com/oVirt/ovirt-site/pull/586

Comment 6 Nir Soffer 2016-12-01 17:29:50 UTC
We are not finished yet, moving back to POST.

Comment 7 Yaniv Lavi 2017-01-04 16:28:22 UTC
Arik, can you please open a blocking bug on API for the feature?

Comment 8 Tal Nisan 2017-01-18 11:33:11 UTC
REST API bug was opened and already solved

Comment 9 Nir Soffer 2017-01-22 14:56:45 UTC
Add a patch to require the libvirt version that allows working with vm leases.

Moving back to post until this patch is merged (should be quick).


Note You need to log in before you can comment on or make changes to this bug.