Hide Forgot
*** Bug 1150012 has been marked as a duplicate of this bug. ***
still missing the patch for 3.4.3-1
I tested the scenario using a thin disk created on a FC storage domain. Installed OS on the guest for simulating an extension of the lv. During the OS installation, VM stops: vdsm.log: libvirtEventLoop::INFO::2014-11-04 09:08:04,563::vm::4602::vm.Vm::(_onIOError) vmId=`bf87e50e-b931-4504-b5c8-4d704369da34`::abnormal vm stop device virtio-disk0 error enospc engine.log: 2014-11-04 10:06:53,133 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-89) [6c63f4ff] VM vm_fc_01 bf87e50e-b931-4504-b5c8-4d704369da34 moved from Up --> Paused 2014-11-04 10:06:53,251 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-89) [6c63f4ff] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm_fc_01 has paused due to no Storage space error. libvirt.log: 2014-11-04 09:08:04.530+0000: 107968: debug : qemuProcessHandleIOError:938 : Transitioned guest vm_fc_01 to paused state due to IO error The VM is unpaused immediately and OS installation is resumed. I'm moving the bug to ASSIGNED since the VM still gets paused on storage space error. Checked on: vdsm-4.14.17-1.pkvm2_1.1.ppc64 libvirt-1.1.3-1.pkvm2_1.17.11.ppc64 qemu-kvm-1.6.0-2.pkvm2_1.17.10.ppc64 Attaching: /var/log directory from host and engine.log
Created attachment 953508 [details] /var/log/ from the host and engine.log
what do you expect the VM to do while the storage is being extended/allocated?
(In reply to Michal Skrivanek from comment #7) > what do you expect the VM to do while the storage is being > extended/allocated? VM shouldn't get paused, the volume extend operation should occur before the disk gets to a situation it runs out of space
I can;t find any regular extension request in vdsm.log. Seems you were either writing too quickly or the highWrite monitoring doesn't work. Please verify settings and behavior around the threshold of extension...before you reach disk full. Comment #5 just shows once you reach ENOSPC the drives get extended and it continues ok
(In reply to Michal Skrivanek from comment #9) > I can;t find any regular extension request in vdsm.log. Seems you were > either writing too quickly or the highWrite monitoring doesn't work. > Please verify settings and behavior around the threshold of > extension...before you reach disk full. Comment #5 just shows once you reach > ENOSPC the drives get extended and it continues ok Just for clarification - I'm not exteding the volume manually. I've created a thin provision disk on a FC domain and installed OS on it. I expect vdsm to perform lvextend operation automatically when necessary for extending the volume when it reaches to the defined threshold for extension.
(In reply to Elad from comment #10) > Just for clarification - I'm not exteding the volume manually. I've created > a thin provision disk on a FC domain and installed OS on it. I expect vdsm > to perform lvextend operation automatically when necessary for extending the > volume when it reaches to the defined threshold for extension. I'm not saying you are. I'm saying you should verify your threshold and monitoring interval setting and make sure you're not write in higher rate than that. If there is an issue in the code, with highWrite function, then please attach vdsm.log since vdsm startup. I do see some related issues with that from ~5 days ago. Since then the vdsm was restarted so it may not be connected, but still..more logs always help. But please check what I said in comment #5 first
Created attachment 953617 [details] vdsm logs (part 2)
The fact that the vm was unpaused automatically prove that this bug is fixed. This fix handles the case when vm is paused after it the disk lost the selinux label, and the vm cannot access it. In this state not only the vm will never unpause, but it cannot be resumed manually. The only way to use such vm is to shutdown and start it again. What you describe here is unrelated issue, vm getting paused for shot time during heavy io usage. Please open another bug for this issue. Note that we cannot guarantee that vm will never pause during heavy io workload. We only guarantee that the vm will be unpaused in this case after a disk was extended.
Created attachment 953630 [details] vdsm logs (part 1)
Created attachment 953645 [details] vdsm logs (part 1-1)
(In reply to Nir Soffer from comment #13) > The fact that the vm was unpaused automatically prove that this bug is fixed. > > This fix handles the case when vm is paused after it the disk lost the > selinux label, and the vm cannot access it. In this state not only the vm > will never unpause, but it cannot be resumed manually. The only way to use > such vm is to shutdown and start it again. > > What you describe here is unrelated issue, vm getting paused for shot time > during heavy io usage. Please open another bug for this issue. > > Note that we cannot guarantee that vm will never pause during heavy io > workload. We only guarantee that the vm will be unpaused in this case after > a disk was extended. Since the described behavior is the expected, moving the bug to VERIFIED (details in comment #5)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2014-1844.html
There is neither in this BZ nor in the errata described at: https://rhn.redhat.com/errata/RHBA-2014-1844.html a clear description what the actual bug is, and what this fix does. Could this information get provided somehow? Thanks in advance
(In reply to Sven Kieske from comment #19) > There is neither in this BZ nor in the errata described at: > https://rhn.redhat.com/errata/RHBA-2014-1844.html > a clear description what the actual bug is, and what this fix does. > > Could this information get provided somehow? The bug: After thin provisioned disk on block storage is extended automatically, the vm pause, and you cannot resume it. The only way to resume is to shutdown the vm and start it again. The root cause: When using thin provisioning on block storage, ovirt creates 1GiB lv. When the disk becomes too full, ovirt extend the lv. Extending a lv trigger a udev change event and vdsm udev rule is evaluated, setting the permissions of the lv. In recent versions of systemd (el7, fedora), udev changed the behavior, removing selinux label from devices when setting device permissions (bug 1147910). This cause the lv to loose the selinux label assigned by libvirt, which cause the vm to loose access to the lv and pause. When the vm is restarted, libvirt assign the selinux label to the vm again. The fix: Vdsm udev rules was modified so vdsm images do not use OWNER and GROUP for setting device permissions. Instead we run the chown command to set device permission, so udev does not modify the device selinux label.
I've added the above explanation (with some minor spelling and grammar fixes) to the doc-text field. I think it's too late to be added in to the errata, but at least it will appear at the standard location of the bug.