Bug 1378967 - Vms get paused when a FC port in SAN change status
Summary: Vms get paused when a FC port in SAN change status
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.18.11
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.1.0-beta
: ---
Assignee: Nir Soffer
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-23 17:03 UTC by Federico Sayd
Modified: 2019-10-24 12:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-20 18:28:11 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
engine.log, and vdsm.log, supervdsm.log, sanlock.log, messages from 2 hosts: rebooted host and host running vms that get paused (5.85 MB, application/x-gzip)
2016-09-23 17:03 UTC, Federico Sayd
no flags Details

Description Federico Sayd 2016-09-23 17:03:19 UTC
Created attachment 1204228 [details]
engine.log, and vdsm.log, supervdsm.log, sanlock.log, messages from 2 hosts: rebooted host and host running vms that get paused

Description of problem:


When any component in FC SAN changes its status, i.e. if I reboot a host, and HBA ports get online, some vms residing in FC storage domain get automatically paused and I have to resume it manually.



Version-Release number of selected component (if applicable):


How reproducible:

Allways

Steps to Reproduce:
1.Put a host in maintenance mode and reboot it
2.Wait until the host boot up for activations of HBA FC ports


Actual results:

Engine reports I/O problems and some vms running in other hosts get paused. Vms need to be resumed manually. Not all vms sharing same storage get paused.


Expected results:

Changes in port status of SAN components should not be detected as I/O problems as far the storage is accessible. Vms should not be paused.


Additional info:


Scenario:

8 virtualization hosts RHEL - 7 - 2.1511.el7.centos.2.10
2 IBM ds3512 storage arrays
2 paths per Host each connected to a QLogic(R) 10-port 4Gb SAN Switch Module for IBM BladeCenter(R)
4 luns connected as FC domains

Comment 1 Yaniv Kaul 2016-11-20 15:28:03 UTC
The pause is done by QEMU, not by any oVirt/RHV components. It is detected as an IO error - so I assume the multipathing is not handling the port state change correctly. 
Have you seen anything in the logs indicating otherwise?

Comment 2 Yaniv Kaul 2016-12-20 18:28:11 UTC
(In reply to Yaniv Kaul from comment #1)
> The pause is done by QEMU, not by any oVirt/RHV components. It is detected
> as an IO error - so I assume the multipathing is not handling the port state
> change correctly. 
> Have you seen anything in the logs indicating otherwise?

Closing - please re-open if you have more details.


Note You need to log in before you can comment on or make changes to this bug.