Bug 1130740 - Faulty storage check when running vm with disks
Summary: Faulty storage check when running vm with disks
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.5.0
Assignee: Allon Mureinik
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-17 09:09 UTC by Ori Gofen
Modified: 2016-02-10 17:39 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-27 16:52:00 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
vdsm+engine logs (1.98 MB, application/gzip)
2014-08-17 09:09 UTC, Ori Gofen
no flags Details

Description Ori Gofen 2014-08-17 09:09:10 UTC
Created attachment 927425 [details]
vdsm+engine logs

Description of problem:

When running vm with thin provision disks on a free space deficit domain (disk virtual size > domain's free space). vdsm writes to vm's disk until fail,with multiple trace backs.

9f9e08df-1e64-4c7b-a073-05eaeb11af51::ERROR::2014-08-17 11:39:37,021::storage_mailbox::172::Storage.SPM.Messages.Extend::(processRequest) processRequest: Exception caught while trying to extend volume: b574fb8b-f1f3-4bf8-b082-078888fd2627 in domain: dd4677a6-8d22-4d6b-837f-c5556dd15ab2
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storage_mailbox.py", line 166, in processRequest
    pool.extendVolume(volume['domainID'], volume['volumeID'], size)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1300, in extendVolume
    sdCache.produce(sdUUID).extendVolume(volumeUUID, size, isShuttingDown)
  File "/usr/share/vdsm/storage/blockSD.py", line 1315, in extendVolume
    lvm.extendLV(self.sdUUID, volumeUUID, size)  # , isShuttingDown)
  File "/usr/share/vdsm/storage/lvm.py", line 1143, in extendLV
    _resizeLV("lvextend", vgName, lvName, size)
  File "/usr/share/vdsm/storage/lvm.py", line 1137, in _resizeLV
    free_size / constants.MEGAB))
VolumeGroupSizeError: Volume Group not big enough: ('dd4677a6-8d22-4d6b-837f-c5556dd15ab2/b574fb8b-f1f3-4bf8-b082-078888fd2627 4096 > 512 (MiB)',)

861c4431-b605-4ae3-9421-092e7cc3fc0c::ERROR::2014-08-17 11:39:38,217::task::866::Storage.TaskManager.Task::(_setError) Task=`b3b2eaea-00e2-4b5b-992d-fdec1d24b9bf`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/virt/vm.py", line 2630, in __afterVolumeExtension
    apparentSize, trueSize = self.__verifyVolumeExtension(volInfo)
  File "/usr/share/vdsm/virt/vm.py", line 2603, in __verifyVolumeExtension
    (volInfo['name'], volInfo['domainID'], volInfo['volumeID']))

861c4431-b605-4ae3-9421-092e7cc3fc0c::ERROR::2014-08-17 11:39:38,221::threadPool::209::Storage.ThreadPool.WorkerThread::(_processNextTask) Task <function runTask at 0x20b9848> failed
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/threadPool.py", line 201, in _processNextTask
    cmd(args)
  File "/usr/share/vdsm/storage/storage_mailbox.py", line 79, in runTask
    ctask.prepare(cmd, *args)
  File "/usr/share/vdsm/storage/task.py", line 103, in wrapper
    return m(self, *a, **kw)
  File "/usr/share/vdsm/storage/task.py", line 1179, in prepare
    raise self.error
RuntimeError: Volume extension failed for vdb (domainID: dd4677a6-8d22-4d6b-837f-c5556dd15ab2, volumeID: b574fb8b-f1f3-4bf8-b082-078888fd2627)


Version-Release number of selected component (if applicable):
rc1

How reproducible:
100%

Steps to Reproduce:
Setup:
-have a domain with 3G free space
-vm with OS+disk thin(virtual size > 3,the disk is on the free space deficit domain)

1.run engine-config -s FreeSpaceCriticalLowInGB=1
2.run service ovirt-engine restart
3.run vm and dd to its disk until multiple failures

Actual results:
vdsm fail to extend the volume,vm crashes shortly after

Expected results:
1.engine should not allow FreeSpaceCriticalLowInGB to be set to negative values (there's already a bug on it BZ # 1130030 )
2.engine should check vm's disk's virtual size and compare it to domain's threshold correctly before running the vm

Additional info:

Comment 1 Allon Mureinik 2014-08-17 12:36:39 UTC
(In reply to Ori from comment #0)
> Actual results:
> vdsm fail to extend the volume,vm crashes shortly after
If there's no space on the domain there's no way to extend this disk.
What do you expect to happen here?

> 
> Expected results:
> 1.engine should not allow FreeSpaceCriticalLowInGB to be set to negative
> values (there's already a bug on it BZ # 1130030 )
Agreed - handled as part of bug 1130030

> 2.engine should check vm's disk's virtual size and compare it to domain's
> threshold correctly before running the vm
I disagree. IMO, this is the point of over-committing. If the admin is willing to overcommit in this fashion, I see no reason in preventing the VM from running. Since this is essentially an SLA decision - Scott, your call.

Comment 2 Allon Mureinik 2014-08-27 16:52:00 UTC
(In reply to Allon Mureinik from comment #1)
> > 2.engine should check vm's disk's virtual size and compare it to domain's
> > threshold correctly before running the vm
> I disagree. IMO, this is the point of over-committing. If the admin is
> willing to overcommit in this fashion, I see no reason in preventing the VM
> from running. Since this is essentially an SLA decision - Scott, your call.

Closing based on this comment.
Scott, if you feel this is wrong, please reopen and specify the required behavior.

Comment 3 Yaniv Lavi 2015-03-08 14:06:51 UTC
This is considered a feature, not a bug, since it allows users to over commit storage for future planning of storage addition on a need basis. We also display allocated space and over commit ratio in the UI, so we inform user of the status of this. So no action item here.


Note You need to log in before you can comment on or make changes to this bug.