Summary: | VM abnormal stop after LV refreshing when using thin provisioning on block storage | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Tal Nisan <tnisan> | |
Component: | vdsm | Assignee: | Nir Soffer <nsoffer> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Gal Amado <gamado> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 3.4.0 | CC: | acanan, amureini, bazulay, bugs, eblake, ecohen, fromani, fsimonce, gamado, gklein, iheim, jdenemar, jsuchane, kwolf, lkuchlan, lpeer, lsurette, mgoldboi, michal.skrivanek, nsoffer, ofrenkel, pbonzini, pdangur, prajnoha, rbalakri, scohen, s.kieske, yeylon, zkabelac | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | 3.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | storage | |||
Fixed In Version: | vdsm-4.16.7 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1127460 | |||
: | 1150012 1150015 (view as bug list) | Environment: | ||
Last Closed: | 2015-02-16 13:37:57 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Bug Depends On: | 1127460 | |||
Bug Blocks: | 1073943, 1142709, 1147536, 1150012, 1150015, 1156162, 1164308, 1164311 |
Description
Tal Nisan
2014-10-06 13:48:13 UTC
Verified to be working on : Red Hat Enterprise Virtualization Manager Version: 3.5.0-0.14.beta.el6ev VDSM :vdsm-4.16.6-1.el7.x86_64 On host with OS :Red Hat Enterprise Linux Server release 7.0 (Maipo) Please backport this patch and under test and have a look http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg05346.html [Qemu-devel] [PATCH] block: extend BLOCK_IO_ERROR event with nospace ind From: Luiz Capitulino Subject: [Qemu-devel] [PATCH] block: extend BLOCK_IO_ERROR event with nospace indicator Date: Fri, 29 Aug 2014 16:07:27 -0400 Management software, such as RHEV's vdsm, want to be able to allocate disk space on demand. The basic use case is to start a VM with a small disk and then the disk is enlarged when QEMU hits a ENOSPC condition. To this end, the management software has to be notified when QEMU encounters ENOSPC. The solution implemented by this commit is simple: it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true when QEMU is stopped due to ENOSPC. Note that support for querying this event is already present in query-block by means of the 'io-status' key. Also, the new 'nospace' BLOCK_IO_ERROR field shares the same semantics with 'io-status', which basically means that werror= has to be set to either 'stop' or 'enospc' to enable 'nospace'. Finally, this commit also updates the 'io-status' key doc in the schema with a list of supported device models. Signed-off-by: Luiz Capitulino <address@hidden> --- Three important observations: 1. We've talked with oVirt and OpenStack folks. oVirt folks say that this implementation is enough for their use-case. OpenStack don't need this feature 2. While testing this with a raw image on a (smaller) ext2 file mounted via the loopback device, I get half "Invalid argument" I/O errors and half "No space" errors". This means that half of the BLOCK_IO_ERROR events that are emitted for this test-case will have nospace=false and the other half nospace=true. I don't know why I'm getting those "Invalid argument" errors, can anyone of the block layer comment on this? I don't get that with a qcow2 image (I get nospace=true for all events) 3. I think this should go via block tree block.c | 22 ++++++++++++++-------- qapi/block-core.json | 8 +++++++- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index 1df13ac..b334e35 100644 --- a/block.c +++ b/block.c @@ -3632,6 +3632,18 @@ BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, bool is_read, int e } } +static void send_qmp_error_event(BlockDriverState *bs, + BlockErrorAction action, + bool is_read, int error) +{ + BlockErrorAction ac; + + ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE; + qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action, + bdrv_iostatus_is_enabled(bs), + error == ENOSPC, &error_abort); +} + /* This is done by device models because, while the block layer knows * about the error, it does not know whether an operation comes from * the device or the block layer (from a job, for example). @@ -3657,16 +3669,10 @@ void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action, * also ensures that the STOP/RESUME pair of events is emitted. */ qemu_system_vmstop_request_prepare(); - qapi_event_send_block_io_error(bdrv_get_device_name(bs), - is_read ? IO_OPERATION_TYPE_READ : - IO_OPERATION_TYPE_WRITE, - action, &error_abort); + send_qmp_error_event(bs, action, is_read, error); qemu_system_vmstop_request(RUN_STATE_IO_ERROR); } else { - qapi_event_send_block_io_error(bdrv_get_device_name(bs), - is_read ? IO_OPERATION_TYPE_READ : - IO_OPERATION_TYPE_WRITE, - action, &error_abort); + send_qmp_error_event(bs, action, is_read, error); } } diff --git a/qapi/block-core.json b/qapi/block-core.json index fb74c56..567e0a6 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -336,6 +336,7 @@ # # @io-status: #optional @BlockDeviceIoStatus. Only present if the device # supports it and the VM is configured to stop on errors +# (supported device models: virtio-blk, ide, scsi-disk) # # @inserted: #optional @BlockDeviceInfo describing the device if media is # present @@ -1569,6 +1570,11 @@ # # @action: action that has been taken # +# @nospace: #optional true if I/O error was caused due to a no-space +# condition. This key is only present if query-block's +# io-status is present, please see query-block documentation +# for more information (since: 2.2) +# # Note: If action is "stop", a STOP event will eventually follow the # BLOCK_IO_ERROR event # @@ -1576,7 +1582,7 @@ ## { 'event': 'BLOCK_IO_ERROR', 'data': { 'device': 'str', 'operation': 'IoOperationType', - 'action': 'BlockErrorAction' } } + 'action': 'BlockErrorAction', '*nospace': 'bool' } } ## # @BLOCK_JOB_COMPLETED -- 1.9.3 The workaround Also can reduce a large number of guest OS high concurrent I/o read and write for lvextend pause or stop [irs] volume_utilization_percent = 50 volume_utilization_chunk_mb = 2048 vol_size_sample_interval = 60 As you can see, by default we only check once per minute if extension is required. You could specify a smaller interval in /etc/vdsm/vdsm.conf to check more frequently. Also, you could increase the chunk_mb value to 2048 and 4096 so that extensions are bigger each time. Paolo, I assume that you're the person to address comment 3 and 4 at? VM abnormal stop after LV refreshing when using thin provisioning on block storage comment 4: in vdsm source->vdsm/vdsm/config.py.in ('volume_utilization_chunk_mb', '4096', None) ... Of course the question at the moment, I still in the test did not find a more perfect solution, but the comment 4 now I have already tested the effect is very good, of course I plan on libvirt and qemu-kvm and VDSM do a balance patch in this three projects.i hope everyone better suggestion is put forward Of course I major in centos 6. X test.... (In reply to sky from comment #4) > The workaround Also can reduce a large number of guest OS high concurrent > I/o read and write for lvextend pause or stop > > [irs] > volume_utilization_percent = 50 > volume_utilization_chunk_mb = 2048 > vol_size_sample_interval = 60 > > As you can see, by default we only check once per minute if extension is > required. This is *not* the configuration we use (default is 2 seconds), and changing this is not supported. Your vms *will* pause if you use this configuration. (In reply to sky from comment #3) > Please backport this patch and under test and have a look This but is not related to qemu; it was caused by undocumented and backward incompatible behavior change in udev. It was solved by modifying vdsm udev rules. Looks like you commented on the wrong bug. |