Bug 1344075 - [3.5] VM split brain during networking issues
Summary: [3.5] VM split brain during networking issues
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Francesco Romani
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On: 1339291 1342388
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-08 16:54 UTC by Michal Skrivanek
Modified: 2019-11-14 08:19 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Previously, when the VDSM service was restarted on a host, the host would still respond to queries from the Manager over JSON-RPC protocol, which could lead to incorrectly reported status of virtual machines in the engine database. In the case of highly available virtual machines, this would cause the virtual machine to be restarted under certain circumstances even though the virtual machine was running. This issue has now been resolved, and API calls are correctly blocked while the VDSM service is starting.
Clone Of: 1342388
Environment:
Last Closed: 2016-06-27 12:42:45 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1452393 0 urgent CLOSED RHEV Guests are corrupted regularly 2021-06-10 12:24:42 UTC
Red Hat Knowledge Base (Solution) 2356491 0 None None None 2016-06-08 16:54:18 UTC
Red Hat Product Errata RHBA-2016:1342 0 normal SHIPPED_LIVE vdsm 3.5.8 critical bug fix update 2016-06-27 16:42:37 UTC
oVirt gerrit 58409 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58462 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58463 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58464 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58465 0 master MERGED rpc: Log important info from VM stats 2016-06-27 08:00:50 UTC
oVirt gerrit 58518 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58567 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58737 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58738 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58772 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58776 0 None None None 2016-06-08 16:54:18 UTC
oVirt gerrit 58892 0 ovirt-3.5 MERGED ignore incoming requests during recovery with json-rpc 2016-06-13 13:28:52 UTC

Internal Links: 1452393

Description Michal Skrivanek 2016-06-08 16:54:18 UTC
+++ This bug was initially created as a clone of Bug #1342388 +++

+++ This bug is a RHEV-M zstream clone. The original bug is: +++
+++   https://bugzilla.redhat.com/show_bug.cgi?id=1339291. +++
+++ Requested by "mskrivan" +++

see parent bug for details.

Comment 2 Francesco Romani 2016-06-13 13:31:05 UTC
https://gerrit.ovirt.org/#/c/58892/ merged -> MODIFIED

Comment 4 Nisim Simsolo 2016-06-22 15:09:07 UTC
Verification build:
rhevm-3.5.8-0.1.el6ev.noarch
qemu-kvm-rhev-0.12.1.2-2.491.el6_8.1.x86_64
libvirt-0.10.2-60.el6.x86_64
vdsm-4.16.37-1.el6ev.x86_64
sanlock-2.8-2.el6_5.x86_64

Verification scenarios:

# Add 60 seconds sleep /usr/share/vdsm/clientIf.py (the scenario of reproducing this bug before the fix):
1. Use 2 hosts under the same cluster, on SPM host edit /usr/share/vdsm/clientIf.p and add time.sleep(60) under def _recoverExistingVms(self):
2. enable HA on VM.
3. Run VM.
4. Restart vdsms service.
5. Verify VM is not migrating to the second host.
After VDSM service restarted, verify same qemu-kvm process is running on SPM host and verify no qemu-kvm process for same VM on the second host.
Verify VM continue to run properly.

# Stop VDSM service:
1. Stop VDSM service on the host with running VM.
2. Wait for host to become non-responsive and VM in unknown state.
3. Verify soft fencing started on the host and VM status restored to up.
4. Verify VM continue to run properly.

# Power off host:
1. Power off host with VM running on it.
2. Wait for host to become in non-responsive state and VM in unknown state.
3. From webadmin confirm 'host has been rebooted'.
4. Verify VM is migrating to the active host and VM is restarting.

Comment 6 errata-xmlrpc 2016-06-27 12:42:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1342

Comment 7 Marina Kalinin 2016-12-05 21:34:20 UTC
Fixed in vdsm-4.16.37-1.el6ev.x86_64, prior to 3.5.9.
Engine bug for 3.5.9:
https://bugzilla.redhat.com/show_bug.cgi?id=1352612

Comment 8 Marina Kalinin 2016-12-05 21:35:15 UTC
Sorry, the 3.5.9 bug is still vdsm-hostdeploy.


Note You need to log in before you can comment on or make changes to this bug.