Bug 1024353 - VM is not automatically unpaused after no space IO error on NFS
VM is not automatically unpaused after no space IO error on NFS
Status: CLOSED DEFERRED
Product: vdsm
Classification: oVirt
Component: General (Show other bugs)
---
Unspecified Unspecified
unspecified Severity low (vote)
: ---
: ---
Assigned To: Tal Nisan
Raz Tamir
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-29 09:32 EDT by Katarzyna Jachim
Modified: 2017-12-22 02:43 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-07-16 05:30:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ylavi: ovirt‑4.2+


Attachments (Terms of Use)

  None (edit)
Description Katarzyna Jachim 2013-10-29 09:32:07 EDT
Description of problem:
After IO error on storage domain (end of space) VM is automatically paused. However, after fixing the problem with the storage domain (free some space), the VM is not automatically unpaused


Version-Release number of selected component (if applicable): is20


How reproducible: 100% in automated tests


Steps to Reproduce:
1. create an NFS storage domain (the underlying storage should be rather small)
2. create a VM with THIN disk (let's say 20 GB) on this sd, install OS, boot it, start writing (just dd)
3. create a big (ca. free space on storage domain - 10 GB), PREALLOCATED disk on the storage domain
4. wait until the VM is paused
5. delete the disk added in point 3
6. wait until the VM is unpaused

Actual results:
the VM is never unpaused

Expected results:
the VM should be unpaused when there is free space on the sd

Additional info:
It may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1003588, but I prefer to be 100% sure, especially as this scenario works fine for iSCSI storage domains
Comment 3 Federico Simoncelli 2013-11-15 07:16:10 EST
This is probably not related to bug 1003588. I haven't looked at the logs well enough but the error here should be ENOSPC (instead of EIO).

I'm not sure if the feature was supposed to cover also ENOSPC errors as they're not related to domain availability.

The solution for ENOSPC is not related at all to the solution proposed for EIO (domain state change).

Ayal, what do you think?
Comment 4 Ayal Baron 2013-11-17 07:03:08 EST
(In reply to Federico Simoncelli from comment #3)
> This is probably not related to bug 1003588. I haven't looked at the logs
> well enough but the error here should be ENOSPC (instead of EIO).
> 
> I'm not sure if the feature was supposed to cover also ENOSPC errors as
> they're not related to domain availability.
> 
> The solution for ENOSPC is not related at all to the solution proposed for
> EIO (domain state change).
> 
> Ayal, what do you think?

indeed we have no mechanism for this on NFS atm (on block domains we automatically extend and resume).
This is a gap we'll need to cover in 3.4
Comment 10 Yaniv Lavi 2016-12-04 10:19:06 EST
This is not a RFE, it's a bug.
We can discuss when to fix it and if, but it's not correct to keep it on future.
Comment 11 Yaniv Kaul 2017-02-12 04:46:04 EST
Tal, who's going to work on it for 4.1?
Comment 12 Nir Soffer 2017-02-12 09:53:08 EST
We don't have a mechanism for unpausing vms on file storage, and our mechanism on
block storage is broken as well; there is no way to detect storage domain state
changes reliably.

Fixing this require major redesign, possible for for 4.2 if we start to work on it
now.
Comment 13 Yaniv Kaul 2017-02-12 10:19:48 EST
(In reply to Nir Soffer from comment #12)
> We don't have a mechanism for unpausing vms on file storage, and our
> mechanism on
> block storage is broken as well; there is no way to detect storage domain
> state
> changes reliably.
> 
> Fixing this require major redesign, possible for for 4.2 if we start to work
> on it
> now.

We could just retry periodically, I believe.
Comment 14 Yaniv Lavi 2017-02-23 06:24:52 EST
Moving out all non blocker\exceptions.
Comment 15 Nir Soffer 2017-02-28 18:42:12 EST
(In reply to Yaniv Dary from comment #14)
> Moving out all non blocker\exceptions.

Yaniv, I don't think this makes sense for 4.1. Did you read comment 12?

We cannot fix problems like this is the last moment, this will only harm the
stability of the product. These kind of issues must be scheduled to the start of
the development for the next version.
Comment 16 Allon Mureinik 2017-07-16 05:30:33 EDT
(In reply to Nir Soffer from comment #12)
> We don't have a mechanism for unpausing vms on file storage, and our
> mechanism on
> block storage is broken as well; there is no way to detect storage domain
> state
> changes reliably.
> 
> Fixing this require major redesign, possible for for 4.2 if we start to work
> on it
> now.

We have seen no such requests from the field, and I don't see us putting any effort in this area - closing, if PM disagrees, please explain and reopen.

Note You need to log in before you can comment on or make changes to this bug.