Bug 1548017 - [RFE] Detect write errors and resume paused vms
Summary: [RFE] Detect write errors and resume paused vms
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.15
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-22 14:29 UTC by Nir Soffer
Modified: 2020-04-01 14:51 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
oVirt Team: Storage
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1335176 0 unspecified CLOSED VMs do not auto-resume after short storage outage 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1548011 0 unspecified CLOSED [RFE] Detect errors on any LUN and resume paused vms 2021-02-22 00:41:40 UTC

Internal Links: 1335176 1548011

Description Nir Soffer 2018-02-22 14:29:53 UTC
Description of problem:

Vdsm monitor uses only read(). If a reading from storage is ok, but writing to
storage fail, a VM may pause while vdsm storage monitoring may see the storage
domain as VALID. Since the storage domain does not change state, we will never
resume the paused VM.

Version-Release number of selected component (if applicable):
Any

How reproducible:
Unkown

Steps to Reproduce:
1. Create a storage domain with 2 LUNs
2. Create enough disks to fill the first LUN
3. Create new disk (should be create on the second LUN
4. Start a vm with the new disk
5. Make the second LUN offline (can be done using sysfs)
6. Perform some io in the vm until the vm pauses
7. Make the second LUN available again

Actual results:
VM remain paused.

Expected results:
VM resumed.

Additional info:
Not tested, but vdsm cannot handle this case.

Possible solution:

1. Create a monitoring LV on every LUN when creating or extending a PV. LVM
   supports specifying a PV when creating a new LV.
2. monitor the special monitoring LVs instead of the metadata LV for both read
   and write.
3. Use the the monitoring LV state to change the status of a storage domain, or
   the disks depending on these PVs.

Such change require moving storage monitoring out of vdsm, the current code cannot
handle checking of hundreds of paths.

Comment 1 Michal Skrivanek 2020-03-18 15:50:13 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 2 Michal Skrivanek 2020-03-18 15:52:47 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 3 Michal Skrivanek 2020-04-01 14:48:59 UTC
ok, closing. Please reopen if still relevant/you want to work on it.

Comment 4 Michal Skrivanek 2020-04-01 14:51:57 UTC
ok, closing. Please reopen if still relevant/you want to work on it.


Note You need to log in before you can comment on or make changes to this bug.