Bug 1378967

Summary: Vms get paused when a FC port in SAN change status
Product: [oVirt] vdsm Reporter: Federico Sayd <fsayd>
Component: GeneralAssignee: Nir Soffer <nsoffer>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.18.11CC: amureini, bugs, fsayd, nsoffer
Target Milestone: ovirt-4.1.0-betaFlags: rule-engine: ovirt-4.1?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-20 18:28:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log, and vdsm.log, supervdsm.log, sanlock.log, messages from 2 hosts: rebooted host and host running vms that get paused none

Description Federico Sayd 2016-09-23 17:03:19 UTC
Created attachment 1204228 [details]
engine.log, and vdsm.log, supervdsm.log, sanlock.log, messages from 2 hosts: rebooted host and host running vms that get paused

Description of problem:


When any component in FC SAN changes its status, i.e. if I reboot a host, and HBA ports get online, some vms residing in FC storage domain get automatically paused and I have to resume it manually.



Version-Release number of selected component (if applicable):


How reproducible:

Allways

Steps to Reproduce:
1.Put a host in maintenance mode and reboot it
2.Wait until the host boot up for activations of HBA FC ports


Actual results:

Engine reports I/O problems and some vms running in other hosts get paused. Vms need to be resumed manually. Not all vms sharing same storage get paused.


Expected results:

Changes in port status of SAN components should not be detected as I/O problems as far the storage is accessible. Vms should not be paused.


Additional info:


Scenario:

8 virtualization hosts RHEL - 7 - 2.1511.el7.centos.2.10
2 IBM ds3512 storage arrays
2 paths per Host each connected to a QLogic(R) 10-port 4Gb SAN Switch Module for IBM BladeCenter(R)
4 luns connected as FC domains

Comment 1 Yaniv Kaul 2016-11-20 15:28:03 UTC
The pause is done by QEMU, not by any oVirt/RHV components. It is detected as an IO error - so I assume the multipathing is not handling the port state change correctly. 
Have you seen anything in the logs indicating otherwise?

Comment 2 Yaniv Kaul 2016-12-20 18:28:11 UTC
(In reply to Yaniv Kaul from comment #1)
> The pause is done by QEMU, not by any oVirt/RHV components. It is detected
> as an IO error - so I assume the multipathing is not handling the port state
> change correctly. 
> Have you seen anything in the logs indicating otherwise?

Closing - please re-open if you have more details.