1149705 – VM abnormal stop after LV refreshing when using thin provisioning on block storage

Bug 1149705 - VM abnormal stop after LV refreshing when using thin provisioning on block storage

Summary: VM abnormal stop after LV refreshing when using thin provisioning on block st...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Nir Soffer
QA Contact:	Gal Amado
Docs Contact:
URL:
Whiteboard:	storage
Depends On:	1127460
Blocks:	1073943 1142709 rhev35betablocker 1150012 1150015 rhev3.5beta3 rhev35rcblocker rhev35gablocker
TreeView+	depends on / blocked

Reported:	2014-10-06 13:48 UTC by Tal Nisan
Modified:	2016-02-10 20:07 UTC (History)
CC List:	29 users (show)
Fixed In Version:	vdsm-4.16.7
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1127460
Clones:	1150012 1150015 (view as bug list)
Environment:
Last Closed:	2015-02-16 13:37:57 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	33492	None	MERGED	lvm: Set libvirt image selinux label on block devices backing vdsm images	2020-11-27 07:22:52 UTC
oVirt gerrit	33555	None	MERGED	spec: Enable lvm selinux fix for Fedora	2020-11-27 07:22:51 UTC
oVirt gerrit	33620	None	MERGED	lvm: Modify lv selinux label only if not labablled as libvirt image	2020-11-27 07:22:52 UTC
oVirt gerrit	33627	None	MERGED	lvm: Set libvirt image selinux label on block devices backing vdsm images	2020-11-27 07:22:51 UTC
oVirt gerrit	33628	None	MERGED	spec: Enable lvm selinux fix for Fedora	2020-11-27 07:23:16 UTC
oVirt gerrit	33632	None	MERGED	lvm: Modify lv selinux label only if not labablled as libvirt image	2020-11-27 07:23:16 UTC

Description Tal Nisan 2014-10-06 13:48:13 UTC

+++ This bug was initially created as a clone of Bug #1127460 +++

Description of problem:

After automatic extend of a disk, VM stop with error: "abnormal vm stop".

When using thin provisioning on block storage, when disk becomes too full,
vdsm ask the spm to extend the disk. After disk is extended, vdsm refresh
the lv. Soon after refreshing the lv, the vm is paused.

It is possible to reproduce this issue without extending a disk, simply by
refreshing the lv used by the vm while the vm is writing to it.

Version-Release number of selected component (if applicable):
- vdsm master
- vdsm from ovirt-3.5
- vdsm from ovirt-3.4

Platforms where issue can be reproduced:
Fedora 19
Fedora 20

Platform where issue cannot be reproduced:
RHEL 6.5

Tested disk interfaces
- virio
- virtio-scsi
- ide

How reproducible:
Always

Steps to Reproduce:
1. Start a vm
2. Run code writing to disk on the vm:
   while true; do date > guest.log 2>&1; sync; sleep 1; done
3. Run lvchange --refresh vgname/lvname
   (see refresh.out for the full command line used)

Actual results:
After few milliseconds or seconds, the vm stop

Expected results:
Vm should continue to run normally

Additional info:

In /var/log/libvirt/qemu/vmname.log we don't see any error

In /var/log/libvirt/libvirt.log we see this error:
2014-08-06 21:15:39.793+0000: 821: debug : qemuMonitorIOProcess:393 : QEMU_MONITOR_IO_PROCESS: mon=0x7f89a800c480 buf={"timestamp": {"seconds": 1407359739, "microseconds": 7937
42}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "operation": "write", "action": "stop"}}
 len=173

In /var/vdsm/vdsm.log we see this error:
Thu Aug  7 00:15:39 IDT 2014 -------- refreshing lv --------
libvirtEventLoop::INFO::2014-08-07 00:15:39,795::vm::4681::vm.Vm::(_onIOError) vmId=`acf8b4d1-4218-4dad-a665-d9000fbe20dc`::abnormal vm stop device virtio-disk0 error 
libvirtEventLoop::DEBUG::2014-08-07 00:15:39,796::vm::5350::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`acf8b4d1-4218-4dad-a665-d9000fbe20dc`::event Suspended detail 2 opaque None

(The refreshing lv line is written from the refresh script)

In /var/log/messages we don't see any error after the refresh.

Attached files

- after-extend-1/ - pause triggered by automatic extend
- after-extend-2/ - another instance triggered by automatic extend
- refresh-virtio - pause triggered by refreshing the lv when disk used virtio interface
- refresh-virtio-scsi - pause triggered by refreshing the lv when disk used virtio-scsi interface
- refresh-ide - pause triggered by refreshing the lv when disk used ide interface
- refresh.sh - script refreshing the vm disk, using same command line used by vdsm
- udevadm-monitor.sh - script for logging kernel and udev event while refreshing
- rpm-qa.out - output of rpm -qa on the host

In each directory, you can find these files:

- messages - from /var/log/messages
- vdsm.log
- supervdsm.log
- libvirt.log - from /var/log/libvirt/libvirt.log
- ovirt-3.4-fc-vm01.log - qeum vm log from /var/log/libvirt/qeum
- refresh.out - output of refresh.sh
- udevadm-monitor.out - output of udevadm-monitor.sh

--- Additional comment from Nir Soffer on 2014-08-06 18:34:54 EDT ---

Zdenek, can you take a look at this? Is is possible that lvm is doing something wrong that cause an error in qeum?

Please look in refresh.out - it contains the output of
lvchange --refresh -vvvv vg/lv

--- Additional comment from Nir Soffer on 2014-08-06 18:39:32 EDT ---

Kevin, can you take a look at this? If this is not lvm, this must be qemu :-)

--- Additional comment from Nir Soffer on 2014-08-06 18:55:42 EDT ---

Additional info:

I failed to reproduce this without running a vm using vdsm.

1. Running a vm using qemu

I used a command line similar to the command line used by libvirt,
when running a vim using vdsm (see start-vm.sh)

I created a pv, vg and lv of 1G, and on top of it, qemu image of 10G, using
the same parameters used by vdsm.

I started the vm from pxa, and installed Fedora 20. This flow will
cause the vm to pause when using vdsm.

While the installer was running, I extended the lv on another host
and refresh the lv on the same host where qemu was running. this is
a good simulation of automatic extend done by vdsm.

I could not reproduce the issue on Fedora 19, Fedora 20 and RHEL 7.0.

2. Using dd to write to an lv

I created a pv, vg and lv using the same parameters used by vdsm.

Then I run dd, copying few gigabytes to to the lv

While dd was running, I extended the lv on form another host, and refreshed
the lvm on the host were dd was running.

I could not reproduce any error, dd always complete successfully.

You can find the scripts used to test in reproduce.tar.gz

--- Additional comment from Nir Soffer on 2014-08-06 19:02:03 EDT ---

Files:

- conf.sh - configuration used by other scripts (simulate vdsm configuraion)
- extend.sh - script for extending lv on another host
- refresh.sh - script for refreshing lv on the machine running qemu
- write.sh - script for running dd and refreshing lv while dd is running
- start-vm.sh - script for staring vm, using most vdsm parameters
- start-vm-vdsm.sh - copied from /var/log/livbirt/qemu/vm.log  - this is
  how vdsm is running qemu through libvirt.

--- Additional comment from Francesco Romani on 2014-08-08 03:15:59 EDT ---

Investigating. May be relevant: https://bugs.launchpad.net/qemu/+bug/1284090

--- Additional comment from Francesco Romani on 2014-08-08 03:34:56 EDT ---

There is little I can add so far to the analysis Nir did.
VDSM is reacting correctly to the events libvirt (on behalf of QEMU) is sending.

Maybe on RHEL (where the issue cannot be reproduced) a 'resume' event is sent after the lvchange? To share the RHEL libvirtd/qemu log could be helpful.

We need to make sure first that the lower levels of the stack (qemu, lvm) are behaving correctly.

--- Additional comment from Nir Soffer on 2014-08-08 08:35:24 EDT ---

(In reply to Francesco Romani from comment #6)
> Maybe on RHEL (where the issue cannot be reproduced) a 'resume' event is
> sent after the lvchange? To share the RHEL libvirtd/qemu log could be
> helpful.

Note that I can trigger this outside of vdsm - it is not related to what vdsm does after extending a disk.

--- Additional comment from Francesco Romani on 2014-08-13 10:27:49 EDT ---

removed from blockers for 3.5

Reason:
- happens also on oVirt 3.4, so it is not a 3.5 regression
- all the evidence yet points to lower levels of the stack (qemu, lvm)
- there is little (if any) room for workarounds in VDSM

that said, this bug is still very high priority and not by any means less serious.

--- Additional comment from Kevin Wolf on 2014-08-13 11:09:11 EDT ---

A difference between RHEL and upstream is that RHEL includes more information
to the BLOCK_IO_ERROR event. Specifically it includes a field that informs the
management tool about the error code (__com.redhat_reason). As far as I know,
this is what is checked by VDSM in order to determine whether we have an ordinary
I/O error or got an -ENOSPC and need to resize the LV. No idea whether it also
affects how the restart of the VM is done.

The missing error code fields also mean that it's hard to see from the logs
what the exact error is that happened (this is what comment 5 refers to).

--- Additional comment from Nir Soffer on 2014-08-13 11:35:44 EDT ---

(In reply to Kevin Wolf from comment #9)
Kevin, what is the next step? How can we get more info from qemu to understand this failure?

--- Additional comment from Nir Soffer on 2014-08-13 11:37:38 EDT ---

Adding back the needinfo for Zdenek.

--- Additional comment from Francesco Romani on 2014-08-13 12:08:07 EDT ---

(In reply to Kevin Wolf from comment #9)
> A difference between RHEL and upstream is that RHEL includes more information
> to the BLOCK_IO_ERROR event. Specifically it includes a field that informs
> the
> management tool about the error code (__com.redhat_reason). As far as I know,
> this is what is checked by VDSM in order to determine whether we have an
> ordinary
> I/O error or got an -ENOSPC and need to resize the LV. No idea whether it
> also
> affects how the restart of the VM is done.

That's probably it.

The error code is what VDSM uses to trigger automatic resize of the volume (if it is 'ENOSPC', it does the resize), and this explains why on RHEL the issue couldn't be reproduced.

VDSM *does* the resume the VM ('continue') but only *after* a succesfull disk extension (in vdsm/virt/vm.py:__afterVolumeExtension).

The lack of error reason doesn't put in motion this chain of actions.

--- Additional comment from Nir Soffer on 2014-08-13 13:14:03 EDT ---

(In reply to Francesco Romani from comment #12)
> (In reply to Kevin Wolf from comment #9)
> > A difference between RHEL and upstream is that RHEL includes more information
> > to the BLOCK_IO_ERROR event. Specifically it includes a field that informs
> > the
> > management tool about the error code (__com.redhat_reason). As far as I know,
> > this is what is checked by VDSM in order to determine whether we have an
> > ordinary
> > I/O error or got an -ENOSPC and need to resize the LV. No idea whether it
> > also
> > affects how the restart of the VM is done.
> 
> That's probably it.
> 
> The error code is what VDSM uses to trigger automatic resize of the volume
> (if it is 'ENOSPC', it does the resize), and this explains why on RHEL the
> issue couldn't be reproduced.
> 
> VDSM *does* the resume the VM ('continue') but only *after* a succesfull
> disk extension (in vdsm/virt/vm.py:__afterVolumeExtension).
> 
> The lack of error reason doesn't put in motion this chain of actions.

I don't think this is related. Refreshing the lv cause BLOCK_IO error 
in qemu, *after* successful extend or without any extend. This error
causes abnormal stop that cannot be recovered without restarting the vm.

We should understand why we get this error.

--- Additional comment from Francesco Romani on 2014-08-19 11:07:04 EDT ---

Eric, what is puzzling is that apparently this issue could not be reproduced outside VDSM, even using the same QEMU command line.

The biggest missing part is libvirt.

Is that possible that the interaction between QEMU and libvirt has a role here?

--- Additional comment from Eric Blake on 2014-08-19 13:08:50 EDT ---

What event API is VDSM using to track IO error cause?  If the code is using virConnectDomainEventRegisterAny() with VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON, then on RHEL-based qemu, you will get one of four strings ("enospc", "eio", "eperm", "eother"), and on Fedora-based qemu you will always get one string ("").  Upstream qemu is considering adding a reason field (to match what downstream RHEL has already had for a couple of years), but right now the debate is on whether it has to be a full-featured string or whether a simple boolean for nospace is sufficient.  Libvirt is just acting as a passthrough for the reason field.

--- Additional comment from Eric Blake on 2014-08-19 13:16:30 EDT ---

(In reply to Francesco Romani from comment #12)
> The error code is what VDSM uses to trigger automatic resize of the volume
> (if it is 'ENOSPC', it does the resize), and this explains why on RHEL the
> issue couldn't be reproduced.
> 
> VDSM *does* the resume the VM ('continue') but only *after* a succesfull
> disk extension (in vdsm/virt/vm.py:__afterVolumeExtension).
> 
> The lack of error reason doesn't put in motion this chain of actions.

So if I understand correctly, when libvirt gives a reason of "" (because you are using a qemu that doesn't provide a reason), you treat it as a fatal error, regardless of whether the guest was paused due to ENOSPC vs paused due to some other reason?

--- Additional comment from Francesco Romani on 2014-08-21 06:59:34 EDT ---

(In reply to Eric Blake from comment #16)
> (In reply to Francesco Romani from comment #12)
> > The error code is what VDSM uses to trigger automatic resize of the volume
> > (if it is 'ENOSPC', it does the resize), and this explains why on RHEL the
> > issue couldn't be reproduced.
> > 
> > VDSM *does* the resume the VM ('continue') but only *after* a succesfull
> > disk extension (in vdsm/virt/vm.py:__afterVolumeExtension).
> > 
> > The lack of error reason doesn't put in motion this chain of actions.
> 
> So if I understand correctly, when libvirt gives a reason of "" (because you
> are using a qemu that doesn't provide a reason), you treat it as a fatal
> error, regardless of whether the guest was paused due to ENOSPC vs paused
> due to some other reason?

This is correct: VDSM has no mean to distinguish what happened.
We cannot just use werror=enospc in QEMU, of course passing through libvirt, because VDSM needs to be aware of the other errors.

--- Additional comment from Francesco Romani on 2014-08-21 07:17:25 EDT ---

I spent quite some time reproducing locally this issue, and I believe I reached a stable point.

Quick summary:
- VDSM using u/s QEMU cannot trasparently extend the volume because the lack of the 'reason' field. Not VDSM bug, QEMU bug upstream filed as stated into https://bugzilla.redhat.com/show_bug.cgi?id=1127460#c5 ; We need fix on QEMU.
- the LV refresh issue seems unrelated and not an issue

Now I'm going to explain the above points with more details:

+++ This is how the flow is supposed to work:
flow#1. VDSM runs a VM normally on a thin-provisioned, qcow2 formatted, block device using LVM. Through libvirt, qemu is configured with werror=stop.
flow#2. the drive runs out of space
flow#2.a. QEMU stops the VM (instructed by VDSM at point #1)
flow#2.b. QEMU reports a BLOCK_IO_ERROR with reason='enospc' to signal the space exausted
flow#3. libvirt just translates the monitor event in its own event format
flow#4. VDSM detects the event
flow#4.a. runs lvextend $options on the affected LV
flow#4.b. runs lvchange $options on the affected LV
flow#4.c. sends un-pause the VM using the the 'continue' command
flow#5. the VM restarts

Please note:
if no one sends, through the QEMU monitor, the 'continue' command (flow#4.c),
the VM will remain paused! And this is what I believe happened in the original report, because we see

+++ What I believe happened in the original report:

We surely seen this: 

"event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "operation": "write", "action": "stop"}}

In the reproduce scripts/logs I see werror=stop in the QEMU command line

AFAIK the only source of BLOCK_IO_ERROR is a failed write (looked at the QEMU sources), so this confirms that QEMU runned out of space and triggered the (broken) flow below.

Please note that timing is critical here: if LV is extended before QEMU hits the limit, then AFAIK it will happily run without issue.

This fully explain all what happened on the original report.

+++ Impact of LV refresh

There is a pending question, re-stated in https://bugzilla.redhat.com/show_bug.cgi?id=1127460#c3 : can LV refresh *alone* cause harm to QEMU and make it stop?

I don't see how it could be possible, but it is worth to check anyway.

To verify this I did the following:

- create a LV volume both on physical disc and then through ISCSI;
- create a qcow2 image on it
- make sure the image is big enough, so we *do not* run out of space and we do
  not hit the broken flow
- run QEMU using slightly amended reproduce scripts, using the relevant options
  used by VDSM
- install Fedora 20
- while QEMU is accessing the disk, continuosly refresh every 5s the affected LV

This test passed cleanly on F20 and RHEL7.

+++ Conclusion

I believe the LV refresh lead is a red herring. The issue here is the lack of reason on BLOCK_IO_ERROR which completely break the automatically extend flow.
But the bug was reported, and it is way out of the control of VDSM.

VDSM has no issue here.
Because all the above, I'm going to decrease priority of this bug.

--- Additional comment from Francesco Romani on 2014-08-21 07:26:43 EDT ---

scripts and log to verify if LV refresh is harming QEMU.

scripts/chainrefresh.sh - runs the refreshing every 5s and logs the output
scripts/start-vm-vdsm2.sh - runs a QEMU using (almost) the same parameters of VDSM. Amended the SPICE parameters and the paths, not relevant to this BZ
scripts/start-vm2.sh - simpler QEMU invocation, used only once on F20
scripts/conf.sh - parameters: LVM config, LV and VG paths

logs:
f20_simple: run of start-vm2.sh on a F20 host, virt-preview repo enabled, LV on a physical disk
f20_vdsm: like above, but using start-vm-vdsm2.sh
rhel7: like above, on RHEL7, using stock packages plus RHEV repo.

on each log dir:
refresh.log: output of scripts/chainrefresh.sh
qemu_mon.log: transcript of the QEMU monitor messages, either direct connection (f20_simple) or through qmp-shell.

--- Additional comment from Francesco Romani on 2014-08-21 07:41:24 EDT ---

(In reply to Francesco Romani from comment #18)

> AFAIK the only source of BLOCK_IO_ERROR is a failed write (looked at the
> QEMU sources), so this confirms that QEMU runned out of space and triggered
> the (broken) flow below.

I need to correct myself. What I mean here is only a failed write can trigger a BLOCK_IO_ERROR as it was reported.

There are indeed other possible source of errors like failed reads.

However, this is a minor point; the core point is there is no evidence of a LV refresh alone can cause an I/O error.

--- Additional comment from Francesco Romani on 2014-08-21 07:47:26 EDT ---

Workaround does exist for not-RHEL hosts. For CentOS, there is a qemu-kvm-rhev package which does report the 'reason' field, thus the automatic extend can take place.

--- Additional comment from Nir Soffer on 2014-08-21 10:28:36 EDT ---

(In reply to Francesco Romani from comment #20)
> However, this is a minor point; the core point is there is no evidence of a
> LV refresh alone can cause an I/O error.

Francesco, I think you are not running lvm refresh incorrectly:

    lvchange -vvvvvv --refresh $vg_name/$lv_name

On Fedora and EL 7, this command does nothing because lvmetad daemon is caching metadata. vdsm uses the use_lvmetad=0 option to actualy go to the storage.

You must run lvm commands with the --config options used by vdsm, this is why I
m using conf.sh And you should also create the pvs, vgs, and lvs using the same parameters that vdsm is using.

Please see attachment 924640 [details] for the details.

--- Additional comment from Nir Soffer on 2014-08-21 12:20:37 EDT ---

Francesco, can you confirm that you tested the refresh incorrectly? see comment 22.

--- Additional comment from Francesco Romani on 2014-08-21 19:30:38 EDT ---

(In reply to Nir Soffer from comment #23)
> Francesco, can you confirm that you tested the refresh incorrectly? see
> comment 22.

Yes.
I haven't used the --config option and the options you originally used.
I'll do some testing again.

--- Additional comment from Michal Skrivanek on 2014-08-22 04:17:23 EDT ---

the reported reason is not relevant anymore as we ship qemu-kvm-rhev for all platforms
there's no report about this issue in any known environment
- hence decreasing the urgency

there's no data corruption, "worst" case is VM gets paused
- hence decreasing severity

possible issue in resize and/or lvm refresh flows
- hence moving to storage

not a blocker for 3.5 at this point

--- Additional comment from Kevin Wolf on 2014-08-22 05:41:06 EDT ---

(In reply to Francesco Romani from comment #18)
> - VDSM using u/s QEMU cannot trasparently extend the volume because the lack
> of the 'reason' field. Not VDSM bug, QEMU bug upstream filed as stated into
> https://bugzilla.redhat.com/show_bug.cgi?id=1127460#c5 ; We need fix on QEMU.
>
> [...]
>
> +++ This is how the flow is supposed to work:
> flow#1. VDSM runs a VM normally on a thin-provisioned, qcow2 formatted,
> block device using LVM. Through libvirt, qemu is configured with werror=stop.
> flow#2. the drive runs out of space
> flow#2.a. QEMU stops the VM (instructed by VDSM at point #1)
> flow#2.b. QEMU reports a BLOCK_IO_ERROR with reason='enospc' to signal the
> space exausted

Please note that this is already not the regular flow, but the backup solution.
Generally management should try to extend the LVs early enough that qemu never
runs into an ENOSPC condition. In order to achieve this, it uses query-block
results, specifically the high watermark.

If VDSM wants to be able to cope with qemu versions that don't include an error
reason (which are all upstream versions up to now), it could still simply check
the high watermark after each I/O error and if it's close to the LV size, it
could resize and give it a try if the VM can resume.

--- Additional comment from Nir Soffer on 2014-08-24 06:32:14 EDT ---

(In reply to Michal Skrivanek from comment #25)
> the reported reason is not relevant anymore as we ship qemu-kvm-rhev for all
> platforms
> there's no report about this issue in any known environment
> - hence decreasing the urgency

This is a report from my (known?) environment. Do you suggest to wait until users complain about it?

> possible issue in resize and/or lvm refresh flows
> - hence moving to storage

It is not related to resize, this bug show how you can trigger a pause by refreshing an lv.

There is no problem with storage - we do extend the disk, but qemu is pausing the vm after the extend completed successfully.
- hence returning to virt

--- Additional comment from Francesco Romani on 2014-08-25 02:08:31 EDT ---

(In reply to Nir Soffer from comment #1)
> Zdenek, can you take a look at this? Is is possible that lvm is doing
> something wrong that cause an error in qeum?
> 
> Please look in refresh.out - it contains the output of
> lvchange --refresh -vvvv vg/lv

Ping?

--- Additional comment from Francesco Romani on 2014-08-25 05:34:00 EDT ---

Looks like SELinux could be the culprit here. Did the following test with SELinux *disabled* (see last line) on RHEL7

GENji> 11:28:54 root [/home/fromani/bz1127460/reproduce]$ ./refresh.sh 
refreshing lv...
lvchange --config 
devices {
	ignore_suspended_devices=1
	write_cache_state=0
	disable_after_error_count=3
	obtain_device_list_from_udev=0
}
global {
	locking_type=1
	prioritise_write_locks=1
	wait_for_locks=1
	use_lvmetad=0
}
backup {
	retain_min = 50
	retain_days = 0
}
 --refresh 8ceb838a-4e74-420d-b1e2-817c0e9f8eea/aa39c714-3095-4b86-af56-a2666849a573
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
lv size:   WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  9.00g
GENji> 11:28:58 root [/home/fromani/bz1127460/reproduce]$ date
Mon Aug 25 11:29:27 CEST 2014
GENji> 11:29:27 root [/home/fromani/bz1127460/reproduce]$ ls -lh /dev/8ceb838a-4e74-420d-b1e2-817c0e9f8eea/aa39c714-3095-4b86-af56-a2666849a573
lrwxrwxrwx. 1 root root 8 Aug 25 11:28 /dev/8ceb838a-4e74-420d-b1e2-817c0e9f8eea/aa39c714-3095-4b86-af56-a2666849a573 -> ../dm-10
GENji> 11:30:14 root [/home/fromani/bz1127460/reproduce]$ ls -lh /dev/8ceb838a-4eaudit2why < /var/log/audit/audit.log > grep dm-10 | tail
GENji> 11:30:34 root [/home/fromani/bz1127460/reproduce]$ rm grep 
rm: remove regular file ‘grep’? y
GENji> 11:30:40 root [/home/fromani/bz1127460/reproduce]$ audit2why < /var/log/audit/audit.log | grep dm-10 | tail
type=AVC msg=audit(1408957266.925:3526): avc:  denied  { write } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1408958068.190:3612): avc:  denied  { write } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1408958366.925:3635): avc:  denied  { write } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1408958671.665:3671): avc:  denied  { write } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1408958679.231:3672): avc:  denied  { read } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1408958702.908:3680): avc:  denied  { write } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1408958749.529:3684): avc:  denied  { read } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1408958947.313:3713): avc:  denied  { write } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1408958953.521:3735): avc:  denied  { read } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1408959007.604:3746): avc:  denied  { write } for  pid=18698 comm="qemu-kvm" path="/dev/dm-10" dev="devtmpfs" ino=291267 scontext=system_u:system_r:svirt_t:s0:c301,c701 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
GENji> 11:30:47 root [/home/fromani/bz1127460/reproduce]$ date --date='@1408959007'
Mon Aug 25 11:30:07 CEST 2014
GENji> 11:31:01 root [/home/fromani/bz1127460/reproduce]$ getenforce 
Permissive
GENji> 11:31:19 root [/home/fromani/bz1127460/reproduce]$ virsh list
 Id    Name                           State
----------------------------------------------------
 9     F20_C1                         running

GENji> 11:32:49 root [/home/fromani/bz1127460/reproduce]$ vdsClient localhost list table
56d1c657-dd76-4609-a207-c050699be5be  18698  F20_C1               Up                                       


IIRC libvirt does some SELinux setup for the VM, and this could explain why the issue couldn't be reproduced running QEMU alone.

Nir, can you please confirm that disabling SELinux fixes the issue on your setup as well?

--- Additional comment from Zdenek Kabelac on 2014-08-25 05:50:50 EDT ---

Since you mention selinux here - we are noticing for some time our lvm2 test suite that  running it on selinux enabled system slows whole runtime of test suite by fact 4 or even more - i.e. instead of 15minutes it could be more than hour - so there are issue to be resolved.

We have had discussion with D.Walsh how to audit this thing since lvm2 is doing some operations based on RHEL5 selinux usage - but with RHEL6 - some things are made differently.

--- Additional comment from Michal Skrivanek on 2014-08-25 08:20:19 EDT ---

Can you please check from libvirt's point of view? Seems that's our next lead...

--- Additional comment from Francesco Romani on 2014-08-26 03:41:58 EDT ---

I'd like te add a few more details.

The following applies to a stock RHEL7; I't like to reiterate that also https://bugzilla.redhat.com/show_bug.cgi?id=1127460#c29 was referring to a stock RHEL7.

So, on RHEL7 with SELinux *enabled* (everything as default), we have:

GENji> 09:36:03 root [/home/fromani/bz1127460/reproduce]$ ls -lh /dev/8ceb838a-4e74-420d-b1e2-817c0e9f8eea/aa39c714-3095-4b86-af56-a2666849a573
lrwxrwxrwx. 1 root root 8 Aug 26 09:34 /dev/8ceb838a-4e74-420d-b1e2-817c0e9f8eea/aa39c714-3095-4b86-af56-a2666849a573 -> ../dm-27
GENji> 09:36:13 root [/home/fromani/bz1127460/reproduce]$ ls -lh /dev/dm-
ls: cannot access /dev/dm-: No such file or directory
GENji> 09:36:19 root [/home/fromani/bz1127460/reproduce]$ ls -lh /dev/dm-2
brw-rw----. 1 root disk 253, 2 Aug 26 09:34 /dev/dm-2
GENji> 09:36:20 root [/home/fromani/bz1127460/reproduce]$ ls -lh /dev/dm-27
brw-rw----. 1 vdsm qemu 253, 27 Aug 26 09:36 /dev/dm-27
GENji> 09:36:22 root [/home/fromani/bz1127460/reproduce]$ ls -lhZ /dev/dm-27
brw-rw----. vdsm qemu system_u:object_r:svirt_image_t:s0:c575,c891 /dev/dm-27
GENji> 09:36:26 root [/home/fromani/bz1127460/reproduce]$ virsh list
 Id    Name                           State
----------------------------------------------------
 2     F20_C1                         running

GENji> 09:36:35 root [/home/fromani/bz1127460/reproduce]$ ./refresh.sh 
refreshing lv...
lvchange --config 
devices {
	ignore_suspended_devices=1
	write_cache_state=0
	disable_after_error_count=3
	obtain_device_list_from_udev=0
}
global {
	locking_type=1
	prioritise_write_locks=1
	wait_for_locks=1
	use_lvmetad=0
}
backup {
	retain_min = 50
	retain_days = 0
}
 --refresh 8ceb838a-4e74-420d-b1e2-817c0e9f8eea/aa39c714-3095-4b86-af56-a2666849a573
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
lv size:   WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  11.00g
GENji> 09:36:40 root [/home/fromani/bz1127460/reproduce]$ ls -lhZ /dev/dm-27
brw-rw----. vdsm qemu system_u:object_r:fixed_disk_device_t:s0 /dev/dm-27
GENji> 09:36:41 root [/home/fromani/bz1127460/reproduce]$ virsh list
 Id    Name                           State
----------------------------------------------------
 2     F20_C1                         paused


As this exceprt shows, the LV refresh makes the device lose the SELinux labels.
Proper, required labeling is ensured by libvirt and documented here:

http://libvirt.org/drvqemu.html

I'd like to quote in particular:

"Likewise physical block devices must be labelled system_u:object_r:virt_image_t. "

The incorrect labeling after LV refresh will prevent any further I/O operation from QEMU.

I have every reason to believe that the above will apply also on F20 and on F19, as the original bug report documents. Will verify on Fedora ASAP.

--- Additional comment from Francesco Romani on 2014-08-26 04:29:44 EDT ---

Same on stock Fedora 20:

[root@benji reproduce]# ls -lhZ /dev/8ceb838a-4e74-420d-b1e2-817c0e9f8eea/94856712-7503-466e-8586-6b66981b7b23
lrwxrwxrwx. root root system_u:object_r:device_t:s0    /dev/8ceb838a-4e74-420d-b1e2-817c0e9f8eea/94856712-7503-466e-8586-6b66981b7b23 -> ../dm-8
[root@benji reproduce]# ls -lhZ /dev/dm-8
brw-rw----. vdsm qemu system_u:object_r:svirt_image_t:s0:c516,c990 /dev/dm-8
[root@benji reproduce]# chmod 0755 refresh.sh 
[root@benji reproduce]# ./refresh.sh 
refreshing lv...
lvchange --config 
devices {
	ignore_suspended_devices=1
	write_cache_state=0
	disable_after_error_count=3
	obtain_device_list_from_udev=0
}
global {
	locking_type=1
	prioritise_write_locks=1
	wait_for_locks=1
	use_lvmetad=0
}
backup {
	retain_min = 50
	retain_days = 0
}
 --refresh 8ceb838a-4e74-420d-b1e2-817c0e9f8eea/94856712-7503-466e-8586-6b66981b7b23
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
lv size:   WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  2.00g
[root@benji reproduce]# ls -lhZ /dev/dm-8
brw-rw----. vdsm qemu system_u:object_r:fixed_disk_device_t:s0 /dev/dm-8
[root@benji reproduce]# cat /etc/redhat-release 
Fedora release 20 (Heisenbug)
[root@benji reproduce]# getenforce 
Enforcing
[root@benji reproduce]#

--- Additional comment from Sven Kieske on 2014-08-28 04:02:12 EDT ---

(In reply to Michal Skrivanek from comment #25)
> the reported reason is not relevant anymore as we ship qemu-kvm-rhev for all
> platforms

I'm sorry but this seems not to be true for EL6:

There is a jenkins job running which builds these packages
but they don't get installed by default on centos 6.5.
instead the original qemu-kvm package provided by centos repo is used
which lacks some features.

So there are still some steps missing in providing the mentioned package
via the official ovirt repo.

--- Additional comment from Allon Mureinik on 2014-08-28 04:31:19 EDT ---

(In reply to Sven Kieske from comment #34)
> (In reply to Michal Skrivanek from comment #25)
> > the reported reason is not relevant anymore as we ship qemu-kvm-rhev for all
> > platforms
> 
> I'm sorry but this seems not to be true for EL6:
The fix for bug 1127763 makes vdsm depend on qemu-kvm-rhev, which outdates qemu-kvm, so yum installing VDSM should pull.

I verified this behavior with vdsm-14.6.2 on Centos 6.5 - if you have seen a different behavior, that a bug - could you please file one with all the details about the OS and the vdsm you're installing?

--- Additional comment from Sven Kieske on 2014-08-28 05:13:59 EDT ---

(In reply to Allon Mureinik from comment #35)
> (In reply to Sven Kieske from comment #34)
> > (In reply to Michal Skrivanek from comment #25)
> > > the reported reason is not relevant anymore as we ship qemu-kvm-rhev for all
> > > platforms
> > 
> > I'm sorry but this seems not to be true for EL6:
> The fix for bug 1127763 makes vdsm depend on qemu-kvm-rhev, which outdates
> qemu-kvm, so yum installing VDSM should pull.
> 
> I verified this behavior with vdsm-14.6.2 on Centos 6.5 - if you have seen a
> different behavior, that a bug - could you please file one with all the
> details about the OS and the vdsm you're installing?

Okay sorry, than my information was just outdated
as I'm not running bleeding edge RC software on my production environment ;)

When I look at this bug it is just fixed for 3.5 RC.

Will this get backported to 3.4.4/5 ?

I'm currently still running stuff from 3.3.z repos (locally mirrored)
and just plan the upgrade to 3.4.z


So I guess for most real world deployments your statement is still not true/makes people think it should work.

But for the upcoming 3.5 release it might be correct, I can't test this
atm.

--- Additional comment from Nir Soffer on 2014-08-28 09:18:38 EDT ---

(In reply to Francesco Romani from comment #33)
Francesco, about the libvirt selinux setup:
1. Is this new behavior? is it used on rhel 6.5?
2. Do we enable this feature or it is used by default?
3. Can we disable this "feature"

I think that libvrit setting a selinux label on *our* device is wrong. We should control our devices permissions and selinux context, and libvirt must use what we provided.

We set udev rules to set the permissions and ownership of volumes before they are used. So a possible fix may be to modify these rules to add the selinux context.

280     def appropriateDevice(self, guid, thiefId):
281         ruleFile = _UDEV_RULE_FILE_NAME % (guid, thiefId)
282         rule = 'SYMLINK=="mapper/%s", OWNER="%s", GROUP="%s"\n' % (
283             guid, DISKIMAGE_USER, DISKIMAGE_GROUP)
284         with open(ruleFile, "w") as rf:
285             self.log.debug("Creating rule %s: %r", ruleFile, rule)                                                                                                          
286             rf.write(rule)

--- Additional comment from Allon Mureinik on 2014-08-28 14:42:38 EDT ---

(In reply to Sven Kieske from comment #36)
> (In reply to Allon Mureinik from comment #35)
> > (In reply to Sven Kieske from comment #34)
> > > (In reply to Michal Skrivanek from comment #25)
> > > > the reported reason is not relevant anymore as we ship qemu-kvm-rhev for all
> > > > platforms
> > > 
> > > I'm sorry but this seems not to be true for EL6:
> > The fix for bug 1127763 makes vdsm depend on qemu-kvm-rhev, which outdates
> > qemu-kvm, so yum installing VDSM should pull.
> > 
> > I verified this behavior with vdsm-14.6.2 on Centos 6.5 - if you have seen a
> > different behavior, that a bug - could you please file one with all the
> > details about the OS and the vdsm you're installing?
> 
> Okay sorry, than my information was just outdated
> as I'm not running bleeding edge RC software on my production environment ;)
> 
> When I look at this bug it is just fixed for 3.5 RC.
> 
> Will this get backported to 3.4.4/5 ?
I've cloned bug 1127763 to bug 1135061 to track the backporting of this issue.
It may require some work, but I don't see any reason why we can't do it.

--- Additional comment from Jaroslav Suchanek on 2014-08-29 09:33:55 EDT ---

Jiri, can you please comment it? Thanks.

--- Additional comment from Jiri Denemark on 2014-08-29 11:52:45 EDT ---

So from commet 33, it looks like the SELinux label is lost when the LV is refreshed. If it's not possible to fix it, I think the only solution is to create a udev rule that would restore the label to make sure it is restored as soon as possible. However, unless vdsm uses static SELinux labels, the udev rule will have to be changed everytime a domain is started/stopped.

I guess the main question here is why does the label disappear? Is it because the device is removed and recreated during refresh or something calls restorecon? And can anything that causes it be avoided?

--- Additional comment from Nir Soffer on 2014-09-01 13:55:44 EDT ---

Zdendec, can you explain why a device loose the selinux label after refresh? Is this expected? a bug?

--- Additional comment from Nir Soffer on 2014-09-01 14:14:03 EDT ---

Assuming that we do need to keep selinux label on the device, we can use
the new SECLABEL{module} key. The feature is available in systemd git,
and hopefully will be backported to RHEL 7.

Vdsm rule should look like this:

    SYMLINK=="mapper/xyz", OWNER="vdsm", GROUP="kvm", SECLABEL{selinux}="virt_image_t"

Until this feature is available, this should also work:

    SYMLINK=="mapper/xyz", OWNER="vdsm", GROUP="kvm", RUN+="/bin/chcon -t virt_image_t $env{DEVNAME} $env{DEVLINKS}"

Although it seems that it did not work for Oracle folks having similar issue, and they
are running the chcon command from a script, instead of directly in the udev rule.

See bug 1015300 for more info.

--- Additional comment from Zdenek Kabelac on 2014-09-02 07:25:48 EDT ---

LVM2 is not setting any  selinux labels - it's all now happing in udev - so all things are going though udev rules - there is a template-like rule file:

 12-dm-permissions.rules

where you could see how the se-labels can be handled.

Also using 'SYMLINK==' is no-way-to-go -  check ENV{DM_VG_NAME} and other already set vars.


Other thing which might be worth to check/test here is  lvm.conf option:

devices { disable_after_error_count = 1 }

This should help to eliminate failing device noticeable faster.

--- Additional comment from Nir Soffer on 2014-09-02 13:20:01 EDT ---

(In reply to Zdenek Kabelac from comment #43)
> LVM2 is not setting any  selinux labels - it's all now happing in udev
Sure, but is it expected that after a refresh (lvchange --refresh), a label set on the device will be lost?

We see this error only on Fedora 19, Fedora 20 and RHEL 7, but not on RHEL 6. Is lvm refresh different on these versions?

--- Additional comment from Nir Soffer on 2014-09-14 14:48:26 EDT ---



--- Additional comment from Nir Soffer on 2014-09-15 02:22:15 EDT ---

The udev rule mentioned in comment 37 is not relevant, we use these only for direct luns. vdsm images get their permissions from 12-vdsm-lvm.rules.

--- Additional comment from Allon Mureinik on 2014-09-16 08:42:50 EDT ---

So what's the verdict here?
Do we need a new SELinux policy?

--- Additional comment from Nir Soffer on 2014-09-16 09:22:05 EDT ---

(In reply to Allon Mureinik from comment #47)
> So what's the verdict here?
> Do we need a new SELinux policy?

No. We are blocked on:
- getting an answer for comment 44
- understand why it works for RHEL 6.5

We can fix this corner by adding libvirt selinux label on our images. I tested and it does prevent the pausing on refresh. However this may break other flows like reading or writing data to an image using qemu or dd.

--- Additional comment from Zdenek Kabelac on 2014-09-16 10:47:08 EDT ---

(In reply to Nir Soffer from comment #44)
> (In reply to Zdenek Kabelac from comment #43)
> > LVM2 is not setting any  selinux labels - it's all now happing in udev
> Sure, but is it expected that after a refresh (lvchange --refresh), a label
> set on the device will be lost?
> 
> We see this error only on Fedora 19, Fedora 20 and RHEL 7, but not on RHEL
> 6. Is lvm refresh different on these versions?

Nope - lvrefresh hasn't been change for a long time - it's just suspend/resume per active LV.

All the selinux label magic is hidden in the udev rules processing.

--- Additional comment from Francesco Romani on 2014-09-23 05:49:26 EDT ---

(In reply to Nir Soffer from comment #48)
> We can fix this corner by adding libvirt selinux label on our images. I
> tested and it does prevent the pausing on refresh. However this may break
> other flows like reading or writing data to an image using qemu or dd.

I agree.

The root cause is not yet sorted out. I believe the most likely cause is an udev policy change -maybe due to systemd? but it hasn't pinpointed yet.

The real fix is probably against udev rules. If so, I'm not convinced VDSM is the right place to deliver this fix. However, will work on this direction.

--- Additional comment from Sandro Bonazzola on 2014-09-24 04:11:24 EDT ---

Re-targeting since 3.4.4 has been released and only security/critical fixed will be allowed for 3.4.z. 3.5.0 is also in blockers only phase, so re-targeting to 3.5.1.

--- Additional comment from Nir Soffer on 2014-09-28 15:35:46 EDT ---

Testing on RHEL 6.5 show:

1. Libvirt set the same selinux label (svirt_image_t) on the block device backing the lv (e.g. /dev/dm-40)
2. The selinux label is *not* lost after refreshing the lv
3. There is no udev rule setting this label, so it is probably libvirt

So the question is why selinux label is lost on RHEL 7 (and Fedora), and not on RHEL 6.5?

Zdenek so you have any idea why this happens on RHEL 7 (and Fedora) and not on RHEL 6? Can you get someone from device mapper to look into this?

It seems that the only thing we (vdsm) can do is to update our lvm rules to add this selinux label to all standard vdsm images.

--- Additional comment from Zdenek Kabelac on 2014-09-29 03:40:19 EDT ---

There was no change for selinux on lvm2 code base between RHEL 6 & 7 - and in fact the code base is mostly equal - depends on which version the release is based on.

Looking at vdsm package's udev rule file - it seems that even in RHEL6 there was nothing 'selinux' related - rules only set  OWNER and GROUP  - so I'd have 'guessed' it's something 'selinux' policy based.

I think some selinux expert is needed to answer your question.

Since maybe version 6 allowed to set context from OWNER:GROUP
while version 7 needs process (which is udev in this case)
and the fix should go along comment 42.
(wow we have number 42 in the answer :))

Note - looking at those huge udev rules matching options - I think it would be fairly easier to use some simple common LV name prefix (or VG name if all vdsm volumes have its own VG) like  "VDSM_" and match  LV devices with proper prefix (DM_VG_NAME, DM_LV_NAME).

--- Additional comment from Nir Soffer on 2014-09-29 08:14:56 EDT ---



--- Additional comment from Federico Simoncelli on 2014-09-30 05:50:40 EDT ---

For reference the underlying bug is 1147910

--- Additional comment from Nir Soffer on 2014-10-01 08:22:00 EDT ---

We have temporary solution merged - we are waiting for a real fix from systemd, but do not depend on it any more.

--- Additional comment from Allon Mureinik on 2014-10-01 11:24:52 EDT ---

(In reply to Nir Soffer from comment #56)
> We have temporary solution merged - we are waiting for a real fix from
> systemd, but do not depend on it any more.
Let's please open a bug on VDSM to remind us to consume the relevant RPM when it's fixed.

Comment 2 Gal Amado 2014-10-22 13:42:09 UTC

Verified to be working on :

Red Hat Enterprise Virtualization Manager Version: 3.5.0-0.14.beta.el6ev
VDSM :vdsm-4.16.6-1.el7.x86_64
On host with OS :Red Hat Enterprise Linux Server release 7.0 (Maipo)

Comment 3 sky 2015-01-28 13:54:12 UTC

Please backport this patch and under test and have a look

http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg05346.html

[Qemu-devel] [PATCH] block: extend BLOCK_IO_ERROR event with	nospace ind

From:	 Luiz Capitulino
Subject:	 [Qemu-devel] [PATCH] block: extend BLOCK_IO_ERROR event with	nospace indicator
Date:	 Fri, 29 Aug 2014 16:07:27 -0400
Management software, such as RHEV's vdsm, want to be able to allocate
disk space on demand. The basic use case is to start a VM with a small
disk and then the disk is enlarged when QEMU hits a ENOSPC condition.

To this end, the management software has to be notified when QEMU
encounters ENOSPC. The solution implemented by this commit is simple:
it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
when QEMU is stopped due to ENOSPC.

Note that support for querying this event is already present in
query-block by means of the 'io-status' key. Also, the new 'nospace'
BLOCK_IO_ERROR field shares the same semantics with 'io-status',
which basically means that werror= has to be set to either
'stop' or 'enospc' to enable 'nospace'.

Finally, this commit also updates the 'io-status' key doc in the
schema with a list of supported device models.

Signed-off-by: Luiz Capitulino <address@hidden>
---

Three important observations:

 1. We've talked with oVirt and OpenStack folks. oVirt folks say that
    this implementation is enough for their use-case. OpenStack don't
    need this feature

 2. While testing this with a raw image on a (smaller) ext2 file mounted
    via the loopback device, I get half "Invalid argument" I/O errors and
    half  "No space" errors". This means that half of the BLOCK_IO_ERROR
    events that are emitted for this test-case will have nospace=false
    and the other half nospace=true. I don't know why I'm getting those
    "Invalid argument" errors, can anyone of the block layer comment
    on this? I don't get that with a qcow2 image (I get nospace=true for
    all events)

 3. I think this should go via block tree

 block.c              | 22 ++++++++++++++--------
 qapi/block-core.json |  8 +++++++-
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 1df13ac..b334e35 100644
--- a/block.c
+++ b/block.c
@@ -3632,6 +3632,18 @@ BlockErrorAction bdrv_get_error_action(BlockDriverState 
*bs, bool is_read, int e
     }
 }
 
+static void send_qmp_error_event(BlockDriverState *bs,
+                                 BlockErrorAction action,
+                                 bool is_read, int error)
+{
+    BlockErrorAction ac;
+
+    ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
+    qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action,
+                                   bdrv_iostatus_is_enabled(bs),
+                                   error == ENOSPC, &error_abort);
+}
+
 /* This is done by device models because, while the block layer knows
  * about the error, it does not know whether an operation comes from
  * the device or the block layer (from a job, for example).
@@ -3657,16 +3669,10 @@ void bdrv_error_action(BlockDriverState *bs, 
BlockErrorAction action,
          * also ensures that the STOP/RESUME pair of events is emitted.
          */
         qemu_system_vmstop_request_prepare();
-        qapi_event_send_block_io_error(bdrv_get_device_name(bs),
-                                       is_read ? IO_OPERATION_TYPE_READ :
-                                       IO_OPERATION_TYPE_WRITE,
-                                       action, &error_abort);
+        send_qmp_error_event(bs, action, is_read, error);
         qemu_system_vmstop_request(RUN_STATE_IO_ERROR);
     } else {
-        qapi_event_send_block_io_error(bdrv_get_device_name(bs),
-                                       is_read ? IO_OPERATION_TYPE_READ :
-                                       IO_OPERATION_TYPE_WRITE,
-                                       action, &error_abort);
+        send_qmp_error_event(bs, action, is_read, error);
     }
 }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index fb74c56..567e0a6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -336,6 +336,7 @@
 #
 # @io-status: #optional @BlockDeviceIoStatus. Only present if the device
 #             supports it and the VM is configured to stop on errors
+#             (supported device models: virtio-blk, ide, scsi-disk)
 #
 # @inserted: #optional @BlockDeviceInfo describing the device if media is
 #            present
@@ -1569,6 +1570,11 @@
 #
 # @action: action that has been taken
 #
+# @nospace: #optional true if I/O error was caused due to a no-space
+#           condition. This key is only present if query-block's
+#           io-status is present, please see query-block documentation
+#           for more information (since: 2.2)
+#
 # Note: If action is "stop", a STOP event will eventually follow the
 # BLOCK_IO_ERROR event
 #
@@ -1576,7 +1582,7 @@
 ##
 { 'event': 'BLOCK_IO_ERROR',
   'data': { 'device': 'str', 'operation': 'IoOperationType',
-            'action': 'BlockErrorAction' } }
+            'action': 'BlockErrorAction', '*nospace': 'bool' } }
 
 ##
 # @BLOCK_JOB_COMPLETED
-- 
1.9.3

Comment 4 sky 2015-01-28 14:01:05 UTC

The  workaround Also can reduce a large number of guest OS high concurrent I/o read and write for lvextend pause or stop

[irs]
volume_utilization_percent = 50
volume_utilization_chunk_mb = 2048 
vol_size_sample_interval = 60

As you can see, by default we only check once per minute if extension is required.
  You could specify a smaller interval in /etc/vdsm/vdsm.conf to check more frequently. 
 Also, you could increase the chunk_mb value to 2048  and 4096 so that extensions are bigger each time.

Comment 5 Allon Mureinik 2015-01-28 14:11:30 UTC

Paolo, I assume that you're the person to address comment 3 and 4 at?

Comment 6 sky 2015-01-28 14:28:27 UTC

VM abnormal stop after LV refreshing when using thin provisioning on block storage

comment 4:
in vdsm source->vdsm/vdsm/config.py.in

('volume_utilization_chunk_mb', '4096', None)
...

Of course the question at the moment, I still in the test did not find a more perfect solution, but the comment 4 now I have already tested the effect is very good, of course I plan on libvirt and qemu-kvm and VDSM do a balance patch  in this three projects.i hope everyone better suggestion is put forward
Of course I major in centos 6. X test....

Comment 7 Nir Soffer 2015-01-28 17:03:11 UTC

(In reply to sky from comment #4)
> The  workaround Also can reduce a large number of guest OS high concurrent
> I/o read and write for lvextend pause or stop
> 
> [irs]
> volume_utilization_percent = 50
> volume_utilization_chunk_mb = 2048 
> vol_size_sample_interval = 60
> 
> As you can see, by default we only check once per minute if extension is
> required.

This is *not* the configuration we use (default is 2 seconds), and 
changing this is not supported. Your vms *will* pause if you use
this configuration.

Comment 8 Nir Soffer 2015-01-28 17:07:09 UTC

(In reply to sky from comment #3)
> Please backport this patch and under test and have a look

This but is not related to qemu; it was caused by undocumented and backward 
incompatible behavior change in udev. It was solved by modifying vdsm udev
rules.

Looks like you commented on the wrong bug.

Note You need to log in before you can comment on or make changes to this bug.

acanan
amureini
bazulay
bugs
eblake
ecohen
fromani
fsimonce
gamado
gklein
iheim
jdenemar
jsuchane
kwolf
lkuchlan
lpeer
lsurette
mgoldboi
michal.skrivanek
nsoffer
ofrenkel
pbonzini
pdangur
prajnoha
rbalakri
scohen
s.kieske
yeylon
zkabelac