*** Bug 879590 has been marked as a duplicate of this bug. ***
fromani - does the collectd virt plugin (2?) has this?
(In reply to Yaniv Kaul from comment #20) > fromani - does the collectd virt plugin (2?) has this? Not directly. The virt plugin reports the operations completed so far (https://github.com/collectd/collectd/blob/master/src/virt.c#L454) we need either to configure collectd to aggregate the results somehow and report the OPS (https://collectd.org/wiki/index.php/Plugin:Aggregation) or to do the computation on the engine side.
clearing needinfo, answered in https://bugzilla.redhat.com/show_bug.cgi?id=876697#c21
(In reply to Francesco Romani from comment #21) > (In reply to Yaniv Kaul from comment #20) > > fromani - does the collectd virt plugin (2?) has this? > > Not directly. The virt plugin reports the operations completed so far > (https://github.com/collectd/collectd/blob/master/src/virt.c#L454) > we need either to configure collectd to aggregate the results somehow and > report the OPS (https://collectd.org/wiki/index.php/Plugin:Aggregation) or > to do the computation on the engine side. Cumulative is actually what this RFE is all about. Not really 'I/O Operations Per Second (IOPS) but 'I/O OPerationS' (IOPs?) - quite confusing, but the bottom line is that they indeed seem to want the total number of operations.
So looking at VDSM code, we actually already collect quite a bit of data on both read and writes (lib/vdsm/virt/vmstats.py ): 214 if 'disks' in stat: 215 for disk in stat['disks']: 216 diskprefix = prefix + '.disk.' + disk 217 diskinfo = stat['disks'][disk] 218 219 data[diskprefix + '.read_latency'] = \ 220 diskinfo['readLatency'] 221 data[diskprefix + '.read_ops'] = \ 222 diskinfo['readOps'] 223 data[diskprefix + '.read_bytes'] = \ 224 diskinfo['readBytes'] 225 data[diskprefix + '.read_rate'] = \ 226 diskinfo['readRate'] 227 228 data[diskprefix + '.write_bytes'] = \ 229 diskinfo['writtenBytes'] 230 data[diskprefix + '.write_ops'] = \ 231 diskinfo['writeOps'] 232 data[diskprefix + '.write_latency'] = \ 233 diskinfo['writeLatency'] 234 data[diskprefix + '.write_rate'] = \ 235 diskinfo['writeRate'] Engine (backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/vdsbroker/VdsProperties.java) takes some of them: public static final String vm_disk_read_rate = "readRate"; public static final String vm_disk_write_rate = "writeRate"; public static final String vm_disk_read_latency = "readLatency"; public static final String vm_disk_write_latency = "writeLatency"; And even calculates the latencies (@ backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/vdsbroker/VdsBrokerObjectsBuilder.java ) So we need to add the Ops at least in raw numbers, see what comes out.
this comes from _disk_iops_bytes function which in lib/vdsm/virt/vmstats.py Francesco or Arik probably can give more detailed info - I see it is called in every sample interval and count the read\write io bytes the vm performs until next check
Every sampling interval indeed Vdsm read (among others) those stats from libvirt bulk stats: "block.<num>.rd.reqs" - number of read requests as unsigned long long. "block.<num>.wr.reqs" - number of write requests as unsigned long long. "block.<num>.fl.reqs" - total flush requests as unsigned long long. those are absolutes, so are total number of I/O operations at any given time. Vdsm prefers to return absolute values, and leave the computation of the rate to the client (e.g. Engine). This is considered safer. Looking at vmstats.py, it seems that Vdsm is sending absolutes, not rates, to the metric store. It should be simple to convert the values from absolutes to rates, thus adding the "per second" part. HTH,
Hi Francesco, - do we support multiple disks, or is the number sum on all of them? - any difference on shared disks? - I suspect we are gathering the info same way for all storage types, right? (NFS, iSCSi, Gluster...) At the moment in metrics store we are only able to aggregate by VM, which is not sufficient, we need to be able to check SD IOPS per each VM disk.
oVirt 4.2.0 has been released on Dec 20th 2017. Please consider re-targeting this bug to next milestone
(In reply to Lukas Svaty from comment #35) > Hi Francesco, > > - do we support multiple disks, or is the number sum on all of them? > - any difference on shared disks? > - I suspect we are gathering the info same way for all storage types, right? > (NFS, iSCSi, Gluster...) > > At the moment in metrics store we are only able to aggregate by VM, which is > not sufficient, we need to be able to check SD IOPS per each VM disk. We have the disk name as "collectd.type_instance". Why cant we group the the vm_disk_write_ops and vm_disk_read_ops per disk name(collectd.type_instance)?
(In reply to Lukas Svaty from comment #35) > Hi Francesco, > > - do we support multiple disks, or is the number sum on all of them? > - any difference on shared disks? > - I suspect we are gathering the info same way for all storage types, right? > (NFS, iSCSi, Gluster...) > > At the moment in metrics store we are only able to aggregate by VM, which is > not sufficient, we need to be able to check SD IOPS per each VM disk. Sorry for delayed answer - we report the stats per-disk - in which sense you mean 'shared'? Each VM reports the counters of all its disk, no matter about how the backend is implemented. If more than one VM access the same image (perhaps ISO image in RO mode, maybe installation media?) each one will report its counters, if you want or need to have the counters per-image rather than per-vm, it could be hard to do that and Vdsm can't help here - correct, the storage type is not relevant here
(In reply to Francesco Romani from comment #38) > (In reply to Lukas Svaty from comment #35) > > Hi Francesco, > > > > - do we support multiple disks, or is the number sum on all of them? > > - any difference on shared disks? > > - I suspect we are gathering the info same way for all storage types, right? > > (NFS, iSCSi, Gluster...) > > > > At the moment in metrics store we are only able to aggregate by VM, which is > > not sufficient, we need to be able to check SD IOPS per each VM disk. > > Sorry for delayed answer > - we report the stats per-disk > - in which sense you mean 'shared'? Each VM reports the counters of all its > disk, no matter about how the backend is implemented. > If more than one VM access the same image (perhaps ISO image in RO mode, > maybe installation media?) each one will report its counters, if you want or If more then 1 VM access the same image, they will report the same values right? > need to have the counters per-image rather than per-vm, it could be hard to > do that and Vdsm can't help here > - correct, the storage type is not relevant here
(In reply to Shirly Radco from comment #39) > (In reply to Francesco Romani from comment #38) > > (In reply to Lukas Svaty from comment #35) > > > Hi Francesco, > > > > > > - do we support multiple disks, or is the number sum on all of them? > > > - any difference on shared disks? > > > - I suspect we are gathering the info same way for all storage types, right? > > > (NFS, iSCSi, Gluster...) > > > > > > At the moment in metrics store we are only able to aggregate by VM, which is > > > not sufficient, we need to be able to check SD IOPS per each VM disk. > > > > Sorry for delayed answer > > - we report the stats per-disk > > - in which sense you mean 'shared'? Each VM reports the counters of all its > > disk, no matter about how the backend is implemented. > > If more than one VM access the same image (perhaps ISO image in RO mode, > > maybe installation media?) each one will report its counters, if you want or > > If more then 1 VM access the same image, they will report the same values > right? It depends on the usage patterns on the VM. Let's consider this scenario: we have vm A and vm B that boot from the same shared image. Once the boot is done, they stop at the user login. In this simple scenario, we can rightfully expect that up to the login screen, the counters reported by both VMs will be the same, because the usage pattern of the disk will be exactly the same, thus the number of read operations will be the same (actually, also the order of reads, but that is not reported). In the same example, after the users log in, the usage pattern, thus the disk usage pattern, thus the I/O counters will become unpredictable, then we cannot expect to be the same. The takeaway is that the IO values are always reported per-vm, never per-image.
Shirly, what's the latest here? Is it going to make it to 4.2.2?
As discussed with Yaniv Lavi, the collectd metrics per vm per disk from virt plugin metrics, is sufficient for this RFE. collectd.virt.disk_ops.read collectd.virt.disk_ops.write These metrics should allow the user to quickly / easily identify the VMs that utilized high I/O as requested in this RFE. Disk name is saved as collectd.type_instance. VM name is saved as collectd.plugin_instance. As for being able to aggregate storage IOPS per VM, it is not possible at this stage, since the disk name is not the drive name - it's what libvirt guesses, therefore it can't be aggregated across vms. User can log to the vm in order to be check the drive name.
Aggregation part will be solved in separate BZ. Verified in ovirt-engine-metrics-1.1.3.3-1.el7ev.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1488
BZ<2>Jira Resync