Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1160568

Summary: [PPC][vdsm] VM enters 'Paused' on 'no storage space error' while writing to its thin-provision disk
Product: [oVirt] vdsm Reporter: Elad <ebenahar>
Component: GeneralAssignee: Nobody <nobody>
Status: CLOSED NOTABUG QA Contact: Raz Tamir <ratamir>
Severity: low Docs Contact:
Priority: unspecified    
Version: ---CC: alitke, amureini, bazulay, bugproxy, bugs, ebenahar, fnovak, lbopf, lpeer, lsurette, nsoffer, scohen, srevivo, tnisan, ykaul, ylavi
Target Milestone: ovirt-4.2.0Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-19 16:55:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs
none
logs
none
repoplot analysis none

Description Elad 2014-11-05 07:50:37 UTC
Created attachment 953936 [details]
logs

Description of problem:
I have a VM with a thin provision disk attached to it. The disk is located on a FC storage domain. When writing to the disk from the guest, the VM enters to Paused due to no space left error. After a few seconds, the VM is resumed. It seems that the volume extension is not being done fast enough, which causes the VM disk to consume all its free space allocated to it on the storage before vdsm is able to extend it to the new actual size. 
(Encountered the bug on an IBM PPC environment)

Version-Release number of selected component (if applicable):
vdsm-4.14.17-1.pkvm2_1.1.ppc64
libvirt-1.1.3-1.pkvm2_1.17.11.ppc64
qemu-1.6.0-2.pkvm2_1.17.10.ppc64

rhev 3.4.3 
rhevm-3.4.3-1.2.el6ev.noarch



How reproducible:
Always

Steps to Reproduce:
1. Create a VM with a thin provision disk attached located on a FC domain 
2. Install OS on the guest
3.

Actual results:
During OS installation, the VM enters to Paused state. It seems that the writing to the disk is being done to fast for vdsm and it is not manage to extend the disk before it runs out of space.

VM is stopped due to no space error:

libvirtEventLoop::INFO::2014-11-05 07:20:46,696::vm::4602::vm.Vm::(_onIOError) vmId=`bf87e50e-b931-4504-b5c8-4d704369da34`::abnormal vm stop device virtio-disk1 error enospc

Extension request is being sent after the VM was stopped:

libvirtEventLoop::DEBUG::2014-11-05 07:20:46,701::vm::2608::vm.Vm::(__extendDriveVolume) vmId=`bf87e50e-b931-4504-b5c8-4d704369da34`::Requesting an extension for the volume: {'newSize': 3072, 'domainID': 'e4556645
-98aa-44e5-834b-bfa49279cbc2', 'name': u'vdb', 'volumeID': 'b21d482b-9a0e-42a6-937c-50c45d67b81b', 'imageID': 'd3aec65f-44dd-48f0-a9d9-c9dfd7586f97', 'poolID': '3e0aea33-e35d-4968-a30c-f69af942a328'}



Expected results:
The extension request should be sent before the disk runs out of space.
The VM should not enter to Paused 


Additional info:
I wasn't able to reproduce the issue on a rhev3.5 environment (with FC). Could be becouse of FC HBA speed differences between the 3.4.3 hosts and the 3.5 hosts. 

Attaching:
engine.log
vdsm.log
supervdsm.log 
libvirt.log
qemu log for the specific VM
messages
sanlock.log
audit logs
/usr/share/hwdata/pci.ids from the host for the FC HW info

Comment 1 IBM Bug Proxy 2014-11-13 15:00:50 UTC
------- Comment From fnovak.com 2014-11-13 14:49 EDT-------
reverse mirror Bug 1160568 - [PPC][vdsm] VM enters 'Paused' on 'no storage space error' while writing to its thin-provision disk located on FC domain

Comment 2 Adam Litke 2014-11-13 15:55:40 UTC
Thin provisioned block volumes can be extended two ways (proactive and reactive).  Proactive extension is the preferred approach and the one you describe above.  Under normal circumstances we are able to detect that the VM will run out of space soon and preemptively  extend the volume.  If we don't handle it in time then libvirt pauses the VM and raises an event.  We then reactively perform the extension and unpause the VM.  This is the fallback scenario and VM installation is a known case that exercises this flow due to the high rate of block allocation that happens when installing to a fresh disk.  I am inclined to say that everything is working as designed here but let's confirm a few things...

After looking at vdsm.log it appears that the VM was stopped 2 or 3 times and only briefly (until reactive extension completed).  It also looks like the VM then ran normally thereafter.  Can you confirm these two observations?  If so, then I don't think there is anything to change here.

As a workaround, vdsm provides tunables which could make the scanning more responsive to these types of scenarios:

[irs]
volume_utilization_percent = 50
volume_utilization_chunk_mb = 1024
vol_size_sample_interval = 60

As you can see, by default we only check once per minute if extension is required.  You could specify a smaller interval in /etc/vdsm/vdsm.conf to check more frequently.  Also, you could increase the chunk_mb value to 2048 so that extensions are bigger each time.

It might be interesting to detect when this scenario is happening and automatically boost those values for a limited time, but that would be something for 3.6 at the earliest.

Comment 3 Elad 2014-11-13 17:24:14 UTC
Adam, thanks for explenation, very informative.
 
> After looking at vdsm.log it appears that the VM was stopped 2 or 3 times
> and only briefly (until reactive extension completed).  It also looks like
> the VM then ran normally thereafter.  Can you confirm these two
> observations?  If so, then I don't think there is anything to change here.
Confirmed 
> As a workaround, vdsm provides tunables which could make the scanning more
> responsive to these types of scenarios:
> 
> [irs]
> volume_utilization_percent = 50
> volume_utilization_chunk_mb = 1024
> vol_size_sample_interval = 60
> 
> As you can see, by default we only check once per minute if extension is
> required.  You could specify a smaller interval in /etc/vdsm/vdsm.conf to
> check more frequently.  Also, you could increase the chunk_mb value to 2048
> so that extensions are bigger each time.
> 
> It might be interesting to detect when this scenario is happening and
> automatically boost those values for a limited time, but that would be
> something for 3.6 at the earliest.
That can be great.
But still, the fact that a VM enters to paused just because user installs OS on it gives an impression that something is broken in the system.
I'd suggest to increase the volume_utilization_chunk_mb value if a high speed HBA/NIC is used for accessing the storage.
I encountered the issue while using a host that has 4GB FC HBA. It can be much more noticeable while using higher bandwidth.

Comment 4 Adam Litke 2014-11-14 15:04:23 UTC
In light of Comment 3, I am converting this to an RFE.  The code is currently working as designed but could be improved.

In a future release, it might be nice to boost the vol_size_sample_interval and volume_utilization_chunk_mb parameters for VMs which are experiencing frequent pauses due to -EIO.

Comment 5 IBM Bug Proxy 2015-01-12 21:30:33 UTC
Created attachment 979345 [details]
logs

default comment by bridge

Comment 6 Nir Soffer 2015-08-20 19:48:49 UTC
(In reply to Adam Litke from comment #2)
> Thin provisioned block volumes can be extended two ways (proactive and
> reactive).  Proactive extension is the preferred approach and the one you
> describe above.  Under normal circumstances we are able to detect that the
> VM will run out of space soon and preemptively  extend the volume.  If we
> don't handle it in time then libvirt pauses the VM and raises an event.  We
> then reactively perform the extension and unpause the VM.

Actually the reactive extension is pretty useless, since we check if a drive
needs extension every 2 seconds. On average, it can save 1 second, which
is not important since spm check for extension request once every 3 seconds,
and hsm check for extension reply once every 3 seconds.

> As a workaround, vdsm provides tunables which could make the scanning more
> responsive to these types of scenarios:

This is not a workaround, just tunables so we can tune the values in the
field if needed.

> [irs]
> volume_utilization_percent = 50
> volume_utilization_chunk_mb = 1024
> vol_size_sample_interval = 60

vol_size_sample_interval is not related to extension and has no effect. 

> As you can see, by default we only check once per minute if extension is
> required.  

We check every 2 seconds, the relevant configuration is:

[vars]

# How often should we sample each vm for statistics (seconds).
vm_watermark_interval = 2

(yes the documentation is bogus)

It think we should understand better why the vm got paused in this case,
before we plan fancy dynamic solutions.

In the average case, extension flow can look like this:

Time       Operation
00.00      We detect low free space and send an extend request
01.50      SPM get the request and perform lvextend
           this can block if many operation run in parallel due to lvm
           internal locks. Lets assume it took 0.5 seconds
02.00      SPM send extension reply
03.50      HSM get extension reply and perform lvrefresh operation
           this can block if many operation run in parallel due to lvm
           internal locks. Lets assume it took 0.5 second
04.00      disk was extended

In the worst case, it can take extra 3 seconds, or just block forever
if one of the lvm operation get suck :-)

Maybe the issue is blocking lvm operations, like in bug 1195421?

Comment 7 Adam Litke 2016-03-09 14:14:18 UTC
Nir,

I used your awesome new repoplot tool to examine the vdsm log and I am not seeing any signs of LVM contention for this case.  I am pretty sure they are just installing from a very fast source and are hitting a few pauses along the way.

Comment 8 Nir Soffer 2016-03-09 14:36:51 UTC
(In reply to Adam Litke from comment #7)
> Nir,
> 
> I used your awesome new repoplot tool to examine the vdsm log and I am not
> seeing any signs of LVM contention for this case.  I am pretty sure they are
> just installing from a very fast source and are hitting a few pauses along
> the way.

Can you attach your plot to the bug for inspection?

How about monitoring the write rate in vdsm, for example, keep a running
average for the last 15 reads, and adapt the extension size based on the rate?

With the current defaults, we check every 2 seconds and extend
takes 3-6 seconds, so our extend rate is about 1 GiB / 8s.

If write rate is higher, we need to increase the extend rate.

Using a running average, our extend rate will change nicely 
based on the actual write rate.

Comment 9 Yaniv Lavi 2016-12-05 14:17:04 UTC
This is not a RFE, it's a bug. 
The current design maybe working as designed, but it is causing this bug.
Any change to design should be to resolve this. There reverting the change.
Please answer the needinfo from Nir.

Comment 10 Adam Litke 2017-01-10 19:11:39 UTC
Created attachment 1239209 [details]
repoplot analysis

This repoplot from the vdsm logs shows normal storage conditions with acceptable lvm command latency.  As I stated earlier, this bug is a classic case of a VM writing faster than our extension monitoring can keep up with.

Comment 11 Allon Mureinik 2017-03-19 16:55:04 UTC
For such write rates, we just need to change the default conf.
Pushing out.