The request is to quickly / easily identify the any VM that utilized high I/O preferable in the manager itself. Engine should report storage IOPS cumulative values per VM and this should be collected to DWH and report via API.
The correct way to represent this property (regardless of how it's later displayed to the user, UX-wise) is per vm-disk relationship. Thus, the refactoring described in bug 1142762 should be done first. Removing the devel-ack+ until that's done, and then we should re-evaluate according to the timeframes.
*** Bug 1168021 has been marked as a duplicate of this bug. ***
This is collected via vdsm into the new metrics store.
Please provide verification steps here. What is reported, disk of VMs only, hosts, engine? What is the query for search in kibana? Do we need GA on VMs to report these?
Created attachment 1330141 [details] kibana_screenshot
vdsm reports vm_disk_read_ops, vm_disk_write_ops per vm. GA is not required to get them. These statistics are collected to the metrics store as collectd.statsd.vm_disk_read_ops and collectd.statsd.vm_disk_write_ops See attachment for how to create a graph to check this.
fields searchable in kibana: collectd.plugin:statsd AND ( collectd.type:vm_disk_read_ops OR collectd.type:vm_disk_write_ops) verified in ovirt-engine-metrics-1.1.1-0.0.master.20171001113530.el7.centos.noarch
(In reply to Lukas Svaty from comment #12) > fields searchable in kibana: > collectd.plugin:statsd AND ( collectd.type:vm_disk_read_ops OR > collectd.type:vm_disk_write_ops) Is it by disk name? ID? How do I search for it? How is it accumulated? Does it work for direct LUN disks? > > verified in > ovirt-engine-metrics-1.1.1-0.0.master.20171001113530.el7.centos.noarch
Hi, these are the fields you can search on: ovirt.entity: "vms" ovirt.host_id: `id of host` collectd.plugin_instance: `name of the vm` ("test-vm") collectd.type: "vm_disk_read_ops"|"vm_disk_write_ops" collectd.type_instance: disk name (not partition!) ("sda"|"hdc") ovirt.cluster_name: `name of cluster` ("Default") collectd.plugin: "statsd" collectd.statsd.vm_disk_read_ops `read SDIOPS` collectd.statsd.vm_disk_write_ops `write SDIOPS` collectd namespace of example sample: "collectd": { "dstypes": [ "gauge" ], "interval": 10, "plugin": "statsd", "plugin_instance": "test-vm", "type": "vm_disk_read_ops", "type_instance": "sda", "statsd": { "vm_disk_read_ops": 27436 } How are this stats collectd on vdsm side I do not know as the patches included in bz were abandoned. Shirly can you share the patch, or some insights on this? As per your suggestion I want to re-test this direct-lun and maybe as well with different storage types. Moving back to ON_QA for now.
(In reply to Lukas Svaty from comment #14) > Hi, > > these are the fields you can search on: > ovirt.entity: "vms" > ovirt.host_id: `id of host` > collectd.plugin_instance: `name of the vm` ("test-vm") > collectd.type: "vm_disk_read_ops"|"vm_disk_write_ops" > collectd.type_instance: disk name (not partition!) ("sda"|"hdc") This sounds incorrect to me - can you try: 1. With multiple disks 2. With different OS (Windows, for example). What about hot-plugged disks? > ovirt.cluster_name: `name of cluster` ("Default") > collectd.plugin: "statsd" > collectd.statsd.vm_disk_read_ops `read SDIOPS` > collectd.statsd.vm_disk_write_ops `write SDIOPS` > > collectd namespace of example sample: > "collectd": { > "dstypes": [ > "gauge" > ], > "interval": 10, > "plugin": "statsd", > "plugin_instance": "test-vm", > "type": "vm_disk_read_ops", > "type_instance": "sda", > "statsd": { > "vm_disk_read_ops": 27436 > } > > How are this stats collectd on vdsm side I do not know as the patches > included in bz were abandoned. Shirly can you share the patch, or some > insights on this? > > As per your suggestion I want to re-test this direct-lun and maybe as well > with different storage types. Moving back to ON_QA for now.
As vm disk stats are reported by vdsm I do not know specifically how they are collected. Adding Yaniv B, Can you please elaborate on the vm disks statistics?
https://gerrit.ovirt.org/59066 is the first commit, but we have more patches around this code that changed the format and things like that. Does it answer your question?
No. How does vdsm collect it?
Same way vdsm collects all virt stat and report them to engine. we just send the same value that comes from libvirt stats requests that we poll every interval
These type_instance value are at the moment reported: - hdc - vda - sda This is for both Windows and Linux based Vms, as well as VMs with multiple disks. As we should be able to aggregate at least by VM disk, not only VM name, I am moving this RFE back to ASSIGNED. Shirly: Can you please specify what are these values? As we do not require ovirt-guest-agent I doubt its partitions, in VM. Without Feature Design/Specifications of what these values represents and what is the benefit of them I am unable to do verification. For this RFE I would like to test: 1. Aggregation by disks (to see overused disks in VMs) 2. Data correctness on the data 3. Support for Linux/Windows based images. 4. Support for direct lun and different storage types (NFS, Gluster, iSCSi...)
Yaniv, Can we report a more meaningful name as the disk name? The current name is not helpful for the user to understand which storage domain it is.
Those are the fields I get from libvirt "disks": { "vda": { "readLatency": "0", "writtenBytes": "1597392896", "truesize": "7324557312", "apparentsize": "7324499968", "readOps": "122422", "writeLatency": "13576322", "imageID": "f02d032d-88d8-4b84-8cfc-5a42c6fc884a", "readBytes": "9565093376", "flushLatency": "613964", "readRate": "0.0", "writeOps": "32742", "writeRate": "614.4" }, "hdc": { "readLatency": "0", "writtenBytes": "0", "truesize": "0", "apparentsize": "0", "readOps": "4", "writeLatency": "0", "readBytes": "152", "flushLatency": "0", "readRate": "0.0", "writeOps": "0", "writeRate": "0.0" The storage-domains report which comes in host stats contains the SD uuid "518ea2d6-26a7-4705-88e6-ae83f1ff15e3": { "code": 0, "actual": true, "acquired": true, "delay": "0.000872629", "lastCheck": "2.3", "version": 4, "valid": true } } but its per VM, and the name is the drive name - why isn't it meaningful ?
The name is not the drive name - it's what libvirt guesses can be the drive name. It'd show 'vda' for a virtio-blk device in Windows VM as well...
Shirly, what's the latest here? Is it going to make it to 4.2.2?
As discussed with Yaniv Lavi, the collectd metrics per vm per disk from virt plugin metrics, is sufficient for this RFE. collectd.virt.disk_ops.read collectd.virt.disk_ops.write These metrics should allow the user to quickly / easily identify the VMs that utilized high I/O as requested in this RFE. Disk name is saved as collectd.type_instance. VM name is saved as collectd.plugin_instance. As for the second request, to aggregate storage IOPS per VM, it is not possible at this stage, since the disk name is not the drive name - it's what libvirt guesses, therefore it can't be aggregated across vms.
Moving to verified, will create a separate part for reporting of disk stats from inside of OS. verified in ovirt-engine-metrics-1.1.3.3-1.el7ev.noarch
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.