Bug 1150015

Summary: VM abnormal stop after LV refreshing when using thin provisioning on block storage
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED ERRATA QA Contact: Aharon Canan <acanan>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.4.0CC: acanan, amureini, bazulay, bugs, ebenahar, eblake, ecohen, eedri, fromani, fsimonce, gamado, gklein, iheim, jdenemar, jsuchane, kwolf, lkuchlan, lpeer, lsurette, mgoldboi, michal.skrivanek, nsoffer, ogofen, pdangur, prajnoha, rbalakri, scohen, s.kieske, yeylon, zkabelac
Target Milestone: ---Keywords: ZStream
Target Release: 3.4.3-1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: vdsm-4.14.17-2.el6ev Doc Type: Bug Fix
Doc Text:
Cause: When using thin provisioning on block storage, RHEVM creates a 1GiB LV. When the disk fills up to certain threshold, RHEVM attempts to extend the LV. Extending a LV triggers a udev change event and vdsm's udev rule is evaluated, setting the permissions of the lv. In recent versions of systemd (RHEL7, fedora), udev changed the behavior, removing selinux label from devices when setting device permissions (see bug 1147910). This causes the LV to lose the selinux label assigned by libvirt, which caused the VM to lose access to the LV and pause. When the VM is restarted, libvirt assigns the selinux label to the vm again. Consequence: After a thin provisioned disk on block storage is extended automatically, the VM pauses, and you cannot resume it. The only way to resume it is to shutdown it down and start it up again. Fix: VDSM udev rules were modified so VDSM images do not use OWNER and GROUP for setting device permissions. Instead the chown command is run in order to set device permission, so udev does not modify the device selinux label. Result: VMs with thinly provisioned disks based on block storage do not get paused when an extension is required, and can operate properly on RHEL7 hosts.
Story Points: ---
Clone Of: 1149705 Environment:
Last Closed: 2014-11-12 02:29:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1149705    
Bug Blocks: 1155566, 1156075    
Attachments:
Description Flags
/var/log/ from the host and engine.log
none
vdsm logs (part 2)
none
vdsm logs (part 1)
none
vdsm logs (part 1-1) none

Comment 1 Tal Nisan 2014-10-07 12:29:47 UTC
*** Bug 1150012 has been marked as a duplicate of this bug. ***

Comment 3 Eyal Edri 2014-10-28 14:54:47 UTC
still missing the patch for 3.4.3-1

Comment 5 Elad 2014-11-04 09:31:31 UTC
I tested the scenario using a thin disk created on a FC storage domain. Installed OS on the guest for simulating an extension of the lv.   
During the OS installation, VM stops:

vdsm.log:

libvirtEventLoop::INFO::2014-11-04 09:08:04,563::vm::4602::vm.Vm::(_onIOError) vmId=`bf87e50e-b931-4504-b5c8-4d704369da34`::abnormal vm stop device virtio-disk0 error enospc

engine.log:

2014-11-04 10:06:53,133 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-89) [6c63f4ff] VM vm_fc_01 bf87e50e-b931-4504-b5c8-4d704369da34 moved from Up --> Paused
2014-11-04 10:06:53,251 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-89) [6c63f4ff] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm_fc_01 has paused due to no Storage space error.



libvirt.log:

2014-11-04 09:08:04.530+0000: 107968: debug : qemuProcessHandleIOError:938 : Transitioned guest vm_fc_01 to paused state due to IO error


The VM is unpaused immediately and OS installation is resumed.

I'm moving the bug to ASSIGNED since the VM still gets paused on storage space error.



Checked on:
vdsm-4.14.17-1.pkvm2_1.1.ppc64
libvirt-1.1.3-1.pkvm2_1.17.11.ppc64
qemu-kvm-1.6.0-2.pkvm2_1.17.10.ppc64



Attaching:
/var/log directory from host and engine.log

Comment 6 Elad 2014-11-04 09:42:47 UTC
Created attachment 953508 [details]
/var/log/ from the host and engine.log

Comment 7 Michal Skrivanek 2014-11-04 11:08:36 UTC
what do you expect the VM to do while the storage is being extended/allocated?

Comment 8 Elad 2014-11-04 12:02:32 UTC
(In reply to Michal Skrivanek from comment #7)
> what do you expect the VM to do while the storage is being
> extended/allocated?

VM shouldn't get paused, the volume extend operation should occur before the disk gets to a situation it runs out of space

Comment 9 Michal Skrivanek 2014-11-04 12:23:12 UTC
I can;t find any regular extension request in vdsm.log. Seems you were either writing too quickly or the highWrite monitoring doesn't work.
Please verify settings and behavior around the threshold of extension...before you reach disk full. Comment #5 just shows once you reach ENOSPC the drives get extended and it continues ok

Comment 10 Elad 2014-11-04 12:30:38 UTC
(In reply to Michal Skrivanek from comment #9)
> I can;t find any regular extension request in vdsm.log. Seems you were
> either writing too quickly or the highWrite monitoring doesn't work.
> Please verify settings and behavior around the threshold of
> extension...before you reach disk full. Comment #5 just shows once you reach
> ENOSPC the drives get extended and it continues ok

Just for clarification - I'm not exteding the volume manually. I've created a thin provision disk on a FC domain and installed OS on it. I expect vdsm to perform lvextend operation automatically when necessary for extending the volume when it reaches to the defined threshold for extension.

Comment 11 Michal Skrivanek 2014-11-04 12:33:30 UTC
(In reply to Elad from comment #10)
> Just for clarification - I'm not exteding the volume manually. I've created
> a thin provision disk on a FC domain and installed OS on it. I expect vdsm
> to perform lvextend operation automatically when necessary for extending the
> volume when it reaches to the defined threshold for extension.

I'm not saying you are. I'm saying you should verify your threshold and monitoring interval setting and make sure you're not write in higher rate than that. If there is an issue in the code, with highWrite function, then please attach vdsm.log since vdsm startup. I do see some related issues with that from ~5 days ago. Since then the vdsm was restarted so it may not be connected, but still..more logs always help. But please check what I said in comment #5 first

Comment 12 Elad 2014-11-04 14:09:33 UTC
Created attachment 953617 [details]
vdsm logs (part 2)

Comment 13 Nir Soffer 2014-11-04 14:27:53 UTC
The fact that the vm was unpaused automatically prove that this bug is fixed.

This fix handles the case when vm is paused after it the disk lost the selinux label, and the vm cannot access it. In this state not only the vm will never unpause, but it cannot be resumed manually. The only way to use such vm is to shutdown and start it again.

What you describe here is unrelated issue, vm getting paused for shot time during heavy io usage. Please open another bug for this issue.

Note that we cannot guarantee that vm will never pause during heavy io workload. We only guarantee that the vm will be unpaused in this case after a disk was extended.

Comment 14 Elad 2014-11-04 14:29:12 UTC
Created attachment 953630 [details]
vdsm logs (part 1)

Comment 15 Elad 2014-11-04 14:51:58 UTC
Created attachment 953645 [details]
vdsm logs (part 1-1)

Comment 16 Elad 2014-11-04 15:11:43 UTC
(In reply to Nir Soffer from comment #13)
> The fact that the vm was unpaused automatically prove that this bug is fixed.
> 
> This fix handles the case when vm is paused after it the disk lost the
> selinux label, and the vm cannot access it. In this state not only the vm
> will never unpause, but it cannot be resumed manually. The only way to use
> such vm is to shutdown and start it again.
> 
> What you describe here is unrelated issue, vm getting paused for shot time
> during heavy io usage. Please open another bug for this issue.
> 
> Note that we cannot guarantee that vm will never pause during heavy io
> workload. We only guarantee that the vm will be unpaused in this case after
> a disk was extended.

Since the described behavior is the expected, moving the bug to VERIFIED (details in comment #5)

Comment 18 errata-xmlrpc 2014-11-12 02:29:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2014-1844.html

Comment 19 Sven Kieske 2014-11-12 08:35:53 UTC
There is neither in this BZ nor in the errata described at:
https://rhn.redhat.com/errata/RHBA-2014-1844.html
a clear description what the actual bug is, and what this fix does.

Could this information get provided somehow?
Thanks in advance

Comment 20 Nir Soffer 2014-12-21 13:58:40 UTC
(In reply to Sven Kieske from comment #19)
> There is neither in this BZ nor in the errata described at:
> https://rhn.redhat.com/errata/RHBA-2014-1844.html
> a clear description what the actual bug is, and what this fix does.
> 
> Could this information get provided somehow?

The bug:
After thin provisioned disk on block storage is extended automatically, the vm pause, and you cannot resume it. The only way to resume is to shutdown the vm and start it again.

The root cause:
When using thin provisioning on block storage, ovirt creates 1GiB lv. When the disk becomes too full, ovirt extend the lv. Extending a lv trigger a udev change event and vdsm udev rule is evaluated, setting the permissions of the lv. In recent versions of systemd (el7, fedora), udev changed the behavior, removing selinux label from devices when setting device permissions (bug 1147910). This cause the lv to loose the selinux label assigned by libvirt, which cause the vm to loose access to the lv and pause. When the vm is restarted, libvirt assign the selinux label to the vm again.

The fix:
Vdsm udev rules was modified so vdsm images do not use OWNER and GROUP for setting device permissions. Instead we run the chown command to set device permission, so udev does not modify the device selinux label.

Comment 21 Allon Mureinik 2014-12-21 14:08:25 UTC
I've added the above explanation (with some minor spelling and grammar fixes) to the doc-text field.
I think it's too late to be added in to the errata, but at least it will appear at the standard location of the bug.