Bug 1123395

Summary: vdsm on SPM crashes during RHS failover delay
Product: Red Hat Enterprise Virtualization Manager Reporter: Allie DeVolder <adevolder>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED NOTABUG QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: high    
Version: 3.4.0CC: adevolder, amureini, bazulay, ecohen, iheim, lpeer, scohen, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: All   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-02 12:32:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Allie DeVolder 2014-07-25 14:09:40 UTC
Description of problem:
The customer disconnected an RHS node to perform maintenance, resulting in the 42 second timeout period in RHS for failover to occur. During this hang on the storage, Sanlock reported errors and vdsm crashed on the SPM. Other hosts weren't affected.

Version-Release number of selected component (if applicable):


How reproducible:
Unknown

Steps to Reproduce:
1. Create RHEV storage domain using RHS storage with 2 nodes.
2. Shut down one of the RHS nodes, triggering the 42 s timeout for failover

Actual results:
SPM reports sanlock errors, and vdsm crashes and respawns, finally resulting in a fencing event

Expected results:
Paused VMs, latency warnings in Events tab, followed by recovery