Description of problem: Vdsm monitor uses only read(). If a reading from storage is ok, but writing to storage fail, a VM may pause while vdsm storage monitoring may see the storage domain as VALID. Since the storage domain does not change state, we will never resume the paused VM. Version-Release number of selected component (if applicable): Any How reproducible: Unkown Steps to Reproduce: 1. Create a storage domain with 2 LUNs 2. Create enough disks to fill the first LUN 3. Create new disk (should be create on the second LUN 4. Start a vm with the new disk 5. Make the second LUN offline (can be done using sysfs) 6. Perform some io in the vm until the vm pauses 7. Make the second LUN available again Actual results: VM remain paused. Expected results: VM resumed. Additional info: Not tested, but vdsm cannot handle this case. Possible solution: 1. Create a monitoring LV on every LUN when creating or extending a PV. LVM supports specifying a PV when creating a new LV. 2. monitor the special monitoring LVs instead of the metadata LV for both read and write. 3. Use the the monitoring LV state to change the status of a storage domain, or the disks depending on these PVs. Such change require moving storage monitoring out of vdsm, the current code cannot handle checking of hundreds of paths.
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly
ok, closing. Please reopen if still relevant/you want to work on it.