Description of problem: Add an event to report if a block device usage exceeds a threshold. The threshold should be configurable, and the event should report the affected block device. Rationale for the RFE Managing applications, like oVirt (http://www.ovirt.org), make extensive use of thin-provisioned disk images. In order to let the guest run flawlessly and be not unnecessarily paused, oVirt sets a watermark and automatically resized the image once the watermark is reached or exceeded. In order to detect the mark crossing, the managing application has no choice than aggressively poll the disk highest written sector, using virDomaiGetBlockInfo or the recently added bulk stats equivalent. However, oVirt needs to do very frequent polling. In general, this usage leads to unnecessary system load, and is made even worse under scale: scenarios with hunderds of VM are becoming not unusual. A patch for QEMU to implement disk usage threshold was posted on qemu-devel, reviewd and acked. Once accepted, libvirt should expose this event. This BZ entry is to track libvirt support. Additional info: QEMU upstream bug: https://bugs.launchpad.net/qemu/+bug/1338957?comments=all Includes link to the QEMU API.
Exposing the new event should be easy; the hard part will be figuring out an interface for the user to request that the event should happen. I'm suspecting we need a new API (and thus can't rebase this to happen any sooner than RHEL 7.2), that lets a user register a size to use to trigger a threshold event.
(In reply to Eric Blake from comment #1) > Exposing the new event should be easy; the hard part will be figuring out an > interface for the user to request that the event should happen. I'm > suspecting we need a new API (and thus can't rebase this to happen any > sooner than RHEL 7.2), that lets a user register a size to use to trigger a > threshold event. RHEL 7.2 should be fine for us (= oVirt/RHEV).
(In reply to Eric Blake from comment #1) > Exposing the new event should be easy; the hard part will be figuring out an > interface for the user to request that the event should happen. I'm > suspecting we need a new API (and thus can't rebase this to happen any > sooner than RHEL 7.2), that lets a user register a size to use to trigger a > threshold event. Regarding the API, I'll briefly describe what oVirt currently does. The task is done by VDSM, the oVirt node management daemon. periodically, each disk of each VM is sampled. disk images using format != cow or which are not block devices are immediately discarded. (pseudo-code python-ish follows) for each disk - grab blockInfo: capacity, alloc, physical = virDomainGetblockInfo(drive.path, 0) - check if the drive should be extended or not def _shouldExtendVolume(self, drive, capacity, alloc, physical): # always use the freshest data nextPhysSize = physical + drive.VOLWM_CHUNK_MB * constants.MEGAB # NOTE: the intent of this check is to prevent faulty images to # trick qemu in requesting extremely large extensions (BZ#998443). # Probably the definitive check would be comparing the allocated # space with capacity + format_overhead. Anyway given that: # # - format_overhead is tricky to be computed (it depends on few # assumptions that may change in the future e.g. cluster size) # - currently we allow only to extend by one chunk at time # # the current check compares alloc with the next volume size. # It should be noted that alloc cannot be directly compared with # the volume physical size as it includes also the clusters not # written yet (pending). if alloc > nextPhysSize: pause_vm_using_the_disk() raise Exception return physical - alloc < drive.watermarkLimit(): - the drive's watermarkLimit is expressed on percentage of the drive apparent size and possibly adjusted to accomodate live storage migration. constants: VOLWM_CHUNK_MB = configfile.getint('volume_utilization_chunk_mb') # default: 1024 VOLWM_FREE_PCT = 100 - config.getint('irs', 'volume_utilization_percent') # default: 50 VOLWM_CHUNK_REPLICATE_MULT = 2 # Chunk multiplier during replication def volExtensionChunk(drive): """ Returns the volume extension chunks (used for the thin provisioning on block devices). The value is based on the vdsm configuration but can also dynamically change according to the VM needs (e.g. increase during a live storage migration). """ if drive.isDiskReplicationInProgress(): return drive.VOLWM_CHUNK_MB * drive.VOLWM_CHUNK_REPLICATE_MULT return drive.VOLWM_CHUNK_MB def watermarkLimit(drive): """ Returns the watermark limit, when the LV usage reaches this limit an extension is in order (thin provisioning on block devices). """ return (drive.VOLWM_FREE_PCT * volExtensionChunk(drive) * constants.MEGAB / 100) HTH
Current libvirt proposal for exposing this: https://www.redhat.com/archives/libvir-list/2015-May/msg00580.html
Now upstream in v3.2.0, culminating with this commit: commit 91c3d430c96ca365ae40bf922df3e4f83295e331 Author: Peter Krempa <pkrempa> Date: Thu Mar 16 14:37:56 2017 +0100 qemu: stats: Display the block threshold size in bulk stats Management tools may want to check whether the threshold is still set if they missed an event. Add the data to the bulk stats API where they can also query the current backing size at the same time.