Bug 1342388 - VM split brain during networking issues
Summary: VM split brain during networking issues
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-3.6.7
: 3.6.7
Assignee: Arik
QA Contact: Nisim Simsolo
URL:
Whiteboard:
: 1337203 1452393 (view as bug list)
Depends On: 1339291
Blocks: 1344075
TreeView+ depends on / blocked
 
Reported: 2016-06-03 07:22 UTC by rhev-integ
Modified: 2021-06-10 11:20 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously during vdsm restart, the host would still respond to queries over JSON-RPC protocol from the Manager, which could result in the Manager reporting the incorrect virtual machine state. This could cause a highly available virtual machine to restart despite it already running. This has been fixed and the API calls are blocked during the vdsm service startup.
Clone Of: 1339291
: 1344075 (view as bug list)
Environment:
Last Closed: 2016-06-29 16:20:35 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1452393 0 urgent CLOSED RHEV Guests are corrupted regularly 2021-06-10 12:24:42 UTC
Red Hat Knowledge Base (Solution) 2356491 0 None None None 2016-06-08 07:20:44 UTC
Red Hat Product Errata RHBA-2016:1364 0 normal SHIPPED_LIVE Red Hat Enterprise Virtualization Manager (rhevm) bug fix 3.6.7 2016-06-29 20:18:44 UTC
oVirt gerrit 58409 0 master MERGED core: log vms retrieved on statistics polling 2021-01-03 17:10:06 UTC
oVirt gerrit 58462 0 master MERGED jsonrpc: Fix log level overriding of some methods 2021-01-03 17:10:08 UTC
oVirt gerrit 58463 0 master MERGED rpc: Lower logging priority just for getAllVmStats 2021-01-03 17:10:06 UTC
oVirt gerrit 58464 0 master MERGED rpc: Log calls of API methods with possibly large results 2021-01-03 17:10:06 UTC
oVirt gerrit 58465 0 master MERGED rpc: Log important info from VM stats 2021-01-03 17:10:09 UTC
oVirt gerrit 58518 0 master MERGED ignore incoming requests during recovery with json-rpc 2021-01-03 17:10:45 UTC
oVirt gerrit 58567 0 master MERGED core: refine log for retrieved vms on statistics cycle 2021-01-03 17:10:10 UTC
oVirt gerrit 58737 0 None MERGED ignore incoming requests during recovery with json-rpc 2021-01-03 17:10:10 UTC
oVirt gerrit 58738 0 None MERGED ignore incoming requests during recovery with json-rpc 2021-01-03 17:10:10 UTC
oVirt gerrit 58772 0 ovirt-engine-3.6 MERGED core: log vms retrieved on statistics polling 2021-01-03 17:10:10 UTC
oVirt gerrit 58776 0 ovirt-engine-3.6.7 MERGED core: log vms retrieved on statistics polling 2021-01-03 17:10:08 UTC

Internal Links: 1452393

Comment 2 Francesco Romani 2016-06-07 12:18:48 UTC
58738 merged -> MODIFIED

Comment 3 Francesco Romani 2016-06-07 12:20:19 UTC
the Vdsm changes do not require doc_string updates.

Comment 4 Michal Skrivanek 2016-06-07 13:14:10 UTC
one more petch needs to get in:) https://gerrit.ovirt.org/#/c/58465/

Comment 5 Arik 2016-06-13 22:02:13 UTC
*** Bug 1337203 has been marked as a duplicate of this bug. ***

Comment 6 Nisim Simsolo 2016-06-27 14:04:22 UTC
Verification builds:
rhevm-3.6.7.5-0.1.el6
libvirt-client-1.2.17-13.el7_2.5.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64
vdsm-4.17.31-0.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64

Verification scenarios:

# Add 60 seconds sleep /usr/share/vdsm/clientIf.py (the scenario of reproducing this bug before the fix):
1. Use 2 hosts under the same cluster, on SPM host edit /usr/share/vdsm/clientIf.p and add time.sleep(60) under def _recoverExistingVms(self):
2. enable HA on VM.
3. Run VM.
4. Restart vdsmd service (look for "VM is running in db and not running in VDS 'hostname'" in engine.log).
5. Verify VM is not migrating to the second host.
After VDSM service restarted, verify same qemu-kvm process is running on SPM host and verify no qemu-kvm process for same VM on the second host.
Verify VM continue to run properly.

# Stop VDSM service:
1. Stop VDSM service on the host with running VM.
2. Wait for host to become non-responsive and VM in unknown state.
3. Verify soft fencing started on the host and VM status restored to up.
4. Verify VM continue to run properly.

# Power off host:
1. Power off host with VM running on it.
2. Wait for host to become in non-responsive state and VM in unknown state.
3. From webadmin confirm 'host has been rebooted'.
4. Verify VM is migrating to the active host and VM is restarting.

Comment 8 errata-xmlrpc 2016-06-29 16:20:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1364

Comment 9 Adam Litke 2017-07-31 15:48:59 UTC
*** Bug 1452393 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.