+++ This bug was initially created as a clone of Bug #1116558 +++
Description of problem:
During snapshot deletion (on a NFS datacenter share) vdsm on SPM issues qemu-img command to handle merging of disk images. This process reads data using the hypervisor page cache. As this data might be hotter than any VM memory the system starts swapping out memory pages of the running VMs.
With each VM disk being only accessed once during that process the system should avoid using the page cache.
- Either by accessing the files with direct I/O
- Or by clearing the page cache in regular intervals during that operation
Version-Release number of selected component (if applicable):
Fedora 20 hypervisor node (kernel 3.14.8-200.fc20.x86_64)
Steps to Reproduce:
1. Choose OVirt NFS based datacenter
1. Chose VM with a disk that is larger than the free memory on SPM node
2. Stop VM
3. Create snapshot of the large disk (takes a few seconds)
4. Delete snapshot (takes long)
5. watch sapping on SPM node
- page cache usage on SPM increases
- memory fills up to 100%
- paging swaps out memory of VMs
- No paging should occur
- This behaviour can be seen even with swappiness set to 0
- Graphs attached
I am setting the component of this bz to ovirt-engine-backend because if I am not mistaken at the oVirt workshop Markus Stockhausen pointed out that the VMs were migrated out of the host even before memory swapping.
If that's really the case then there's probably an issue on the engine side where we consider cached memory as used and unreclaimable memory.
If that's not the case then we should switch this bz to vdsm to take advantage of the new qemu-img option to use O_DIRECT on read as well.
(We may want to do this regardless but let's wait for feedback first)
Markus can you please clarify what is the exact behavior? Are the VMs migrated out before host starts swapping? Thanks.
this is another bug about heavy swapping. What you are referring to (and what I tried to explain at the OVirt workshop), is the the swapping during online migration:
But back to this one: The swapping starts during run of qemu-img commands. The reason for this all was that qemu reads thorugh page cache for the above mentioned commands. In between there has been a bugfix for that behaviour. qemu can be passed a parameter to avoid page cache. More details here:
This parameter should be set from OVirt/RHEV if it finds a patched qemu version.
Thanks for the clarification Markus. Moving to vdsm per comment 2.
We already have a vdsm bug 1138690 - I think we should close this as duplicate.
*** This bug has been marked as a duplicate of bug 1138690 ***