Bug 1317433 (ovirt_report_storage_iops_accum_per_vm) - [RFE] Report Storage IOPS cumulative values per VM per disk via metric store.
Summary: [RFE] Report Storage IOPS cumulative values per VM per disk via metric store.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: ovirt_report_storage_iops_accum_per_vm
Product: ovirt-engine-metrics
Classification: oVirt
Component: RFEs
Version: 1.0.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.2.2
: ---
Assignee: Shirly Radco
QA Contact: Lukas Svaty
URL:
Whiteboard:
: 1168021 (view as bug list)
Depends On: ovirt_refactor_disk_class_hierarchy
Blocks: 876697 880593 1168021 1168026
TreeView+ depends on / blocked
 
Reported: 2016-03-14 09:10 UTC by Yaniv Lavi
Modified: 2018-03-29 10:59 UTC (History)
29 users (show)

Fixed In Version:
Clone Of: 876697
Environment:
Last Closed: 2018-03-29 10:59:21 UTC
oVirt Team: Metrics
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: exception+
lsvaty: testing_plan_complete-
ylavi: planning_ack+
rule-engine: devel_ack+
ratamir: testing_ack+


Attachments (Terms of Use)
kibana_screenshot (123.52 KB, image/png)
2017-09-24 09:39 UTC, Shirly Radco
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 880611 1 None None None 2022-03-14 10:24:24 UTC
oVirt gerrit 38759 0 'None' 'ABANDONED' 'core: storing iops into db' 2019-11-28 15:24:16 UTC
oVirt gerrit 38760 0 'None' 'ABANDONED' 'core: use offsets to storing iops counts surviving device hotplugs/restarts.' 2019-11-28 15:24:16 UTC

Internal Links: 880611

Description Yaniv Lavi 2016-03-14 09:10:02 UTC
The request is to quickly / easily identify the any VM that utilized high I/O preferable in the manager itself. Engine should report storage IOPS cumulative values per VM and this should be collected to DWH and report via API.

Comment 1 Allon Mureinik 2016-03-16 09:44:11 UTC
The correct way to represent this property (regardless of how it's later displayed to the user, UX-wise) is per vm-disk relationship. Thus, the refactoring described in bug 1142762 should be done first.

Removing the devel-ack+ until that's done, and then we should re-evaluate according to the timeframes.

Comment 5 Yaniv Lavi 2017-07-31 08:20:54 UTC
*** Bug 1168021 has been marked as a duplicate of this bug. ***

Comment 6 Yaniv Lavi 2017-07-31 08:22:39 UTC
This is collected via vdsm into the new metrics store.

Comment 9 Lukas Svaty 2017-09-21 15:33:24 UTC
Please provide verification steps here.

What is reported, disk of VMs only, hosts, engine?
What is the query for search in kibana?
Do we need GA on VMs to report these?

Comment 10 Shirly Radco 2017-09-24 09:39:33 UTC
Created attachment 1330141 [details]
kibana_screenshot

Comment 11 Shirly Radco 2017-09-24 09:40:06 UTC
vdsm reports vm_disk_read_ops, vm_disk_write_ops per vm.

GA is not required to get them.

These statistics are collected to the metrics store as collectd.statsd.vm_disk_read_ops and collectd.statsd.vm_disk_write_ops

See attachment for how to create a graph to check this.

Comment 12 Lukas Svaty 2017-10-09 13:03:29 UTC
fields searchable in kibana:
collectd.plugin:statsd AND ( collectd.type:vm_disk_read_ops OR collectd.type:vm_disk_write_ops)

verified in ovirt-engine-metrics-1.1.1-0.0.master.20171001113530.el7.centos.noarch

Comment 13 Yaniv Kaul 2017-10-10 21:08:41 UTC
(In reply to Lukas Svaty from comment #12)
> fields searchable in kibana:
> collectd.plugin:statsd AND ( collectd.type:vm_disk_read_ops OR
> collectd.type:vm_disk_write_ops)

Is it by disk name? ID? How do I search for it? How is it accumulated?
Does it work for direct LUN disks?

> 
> verified in
> ovirt-engine-metrics-1.1.1-0.0.master.20171001113530.el7.centos.noarch

Comment 14 Lukas Svaty 2017-10-11 08:13:29 UTC
Hi,

these are the fields you can search on:
ovirt.entity: "vms"
ovirt.host_id: `id of host`
collectd.plugin_instance: `name of the vm` ("test-vm")
collectd.type: "vm_disk_read_ops"|"vm_disk_write_ops"
collectd.type_instance: disk name (not partition!) ("sda"|"hdc")
ovirt.cluster_name: `name of cluster` ("Default")
collectd.plugin: "statsd"
collectd.statsd.vm_disk_read_ops `read SDIOPS`
collectd.statsd.vm_disk_write_ops `write SDIOPS`

collectd namespace of example sample:
"collectd": {
      "dstypes": [
        "gauge"
      ],
      "interval": 10,
      "plugin": "statsd",
      "plugin_instance": "test-vm",
      "type": "vm_disk_read_ops",
      "type_instance": "sda",
      "statsd": {
        "vm_disk_read_ops": 27436
      }

How are this stats collectd on vdsm side I do not know as the patches included in bz were abandoned. Shirly can you share the patch, or some insights on this?

As per your suggestion I want to re-test this direct-lun and maybe as well with different storage types. Moving back to ON_QA for now.

Comment 15 Yaniv Kaul 2017-10-11 08:55:51 UTC
(In reply to Lukas Svaty from comment #14)
> Hi,
> 
> these are the fields you can search on:
> ovirt.entity: "vms"
> ovirt.host_id: `id of host`
> collectd.plugin_instance: `name of the vm` ("test-vm")
> collectd.type: "vm_disk_read_ops"|"vm_disk_write_ops"
> collectd.type_instance: disk name (not partition!) ("sda"|"hdc")

This sounds incorrect to me - can you try:
1. With multiple disks
2. With different OS (Windows, for example).

What about hot-plugged disks?

> ovirt.cluster_name: `name of cluster` ("Default")
> collectd.plugin: "statsd"
> collectd.statsd.vm_disk_read_ops `read SDIOPS`
> collectd.statsd.vm_disk_write_ops `write SDIOPS`
> 
> collectd namespace of example sample:
> "collectd": {
>       "dstypes": [
>         "gauge"
>       ],
>       "interval": 10,
>       "plugin": "statsd",
>       "plugin_instance": "test-vm",
>       "type": "vm_disk_read_ops",
>       "type_instance": "sda",
>       "statsd": {
>         "vm_disk_read_ops": 27436
>       }
> 
> How are this stats collectd on vdsm side I do not know as the patches
> included in bz were abandoned. Shirly can you share the patch, or some
> insights on this?
> 
> As per your suggestion I want to re-test this direct-lun and maybe as well
> with different storage types. Moving back to ON_QA for now.

Comment 16 Shirly Radco 2017-10-15 08:36:12 UTC
As vm disk stats are reported by vdsm I do not know specifically how they are collected.

Adding Yaniv B, Can you please elaborate on the vm disks statistics?

Comment 17 Yaniv Bronhaim 2017-10-15 08:43:18 UTC
https://gerrit.ovirt.org/59066 is the first commit, but we have more patches around this code that changed the format and things like that. Does it answer your question?

Comment 18 Shirly Radco 2017-10-16 10:05:39 UTC
No. How does vdsm collect it?

Comment 19 Yaniv Bronhaim 2017-10-16 10:51:28 UTC
Same way vdsm collects all virt stat and report them to engine. we just send the same value that comes from libvirt stats requests that we poll every interval

Comment 20 Lukas Svaty 2017-12-08 14:57:14 UTC
These type_instance value are at the moment reported:

- hdc
- vda
- sda

This is for both Windows and Linux based Vms, as well as VMs with multiple disks.

As we should be able to aggregate at least by VM disk, not only VM name, I am moving this RFE back to ASSIGNED.

Shirly: 
Can you please specify what are these values?

As we do not require ovirt-guest-agent I doubt its partitions, in VM. Without Feature Design/Specifications of what these values represents and what is the benefit of them I am unable to do verification.

For this RFE I would like to test:
1. Aggregation by disks (to see overused disks in VMs)
2. Data correctness on the data
3. Support for Linux/Windows based images.
4. Support for direct lun and different storage types (NFS, Gluster, iSCSi...)

Comment 21 Shirly Radco 2018-01-10 10:42:23 UTC
Yaniv, Can we report a more meaningful name as the disk name? The current name is not helpful for the user to understand which storage domain it is.

Comment 22 Yaniv Bronhaim 2018-01-22 14:03:32 UTC
Those are the fields I get from libvirt
        "disks": {
            "vda": {
                "readLatency": "0", 
                "writtenBytes": "1597392896", 
                "truesize": "7324557312", 
                "apparentsize": "7324499968", 
                "readOps": "122422", 
                "writeLatency": "13576322", 
                "imageID": "f02d032d-88d8-4b84-8cfc-5a42c6fc884a", 
                "readBytes": "9565093376", 
                "flushLatency": "613964", 
                "readRate": "0.0", 
                "writeOps": "32742", 
                "writeRate": "614.4"
            }, 
            "hdc": {
                "readLatency": "0", 
                "writtenBytes": "0", 
                "truesize": "0", 
                "apparentsize": "0", 
                "readOps": "4", 
                "writeLatency": "0", 
                "readBytes": "152", 
                "flushLatency": "0", 
                "readRate": "0.0", 
                "writeOps": "0", 
                "writeRate": "0.0"
            
The storage-domains report which comes in host stats contains the SD uuid
    "518ea2d6-26a7-4705-88e6-ae83f1ff15e3": {
        "code": 0, 
        "actual": true, 
        "acquired": true, 
        "delay": "0.000872629", 
        "lastCheck": "2.3", 
        "version": 4, 
        "valid": true
    }
}

but its per VM, and the name is the drive name - why isn't it meaningful ?

Comment 23 Yaniv Kaul 2018-01-22 14:09:07 UTC
The name is not the drive name - it's what libvirt guesses can be the drive name. It'd show 'vda' for a virtio-blk device in Windows VM as well...

Comment 24 Yaniv Kaul 2018-03-19 08:41:38 UTC
Shirly, what's the latest here? Is it going to make it to 4.2.2?

Comment 25 Shirly Radco 2018-03-20 11:00:08 UTC
As discussed with Yaniv Lavi, the collectd metrics per vm per disk from virt plugin metrics, is sufficient for this RFE.

collectd.virt.disk_ops.read
collectd.virt.disk_ops.write

These metrics should allow the user to quickly / easily identify the VMs that utilized high I/O as requested in this RFE.

Disk name is saved as collectd.type_instance.
VM name is saved as collectd.plugin_instance.


As for the second request, to aggregate storage IOPS per VM, it is not possible at this stage, since the disk name is not the drive name - it's what libvirt guesses, therefore it can't be aggregated across vms.

Comment 26 Lukas Svaty 2018-03-21 16:09:51 UTC
Moving to verified, will create a separate part for reporting of disk stats from inside of OS.

verified in ovirt-engine-metrics-1.1.3.3-1.el7ev.noarch

Comment 27 Sandro Bonazzola 2018-03-29 10:59:21 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.