1364461 – Network utilization charts are not working properly

Bug 1364461 - Network utilization charts are not working properly

Summary: Network utilization charts are not working properly

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	2
Assignee:	anmol babu
QA Contact:	Daniel Horák
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1366083 (view as bug list)
Depends On:
Blocks:	Console-2-GA
TreeView+	depends on / blocked

Reported:	2016-08-05 12:00 UTC by Daniel Horák
Modified:	2016-10-04 06:59 UTC (History)
CC List:	10 users (show)
Fixed In Version:	RHEL: rhscon-agent-0.0.18-1.el7scon Ubuntu: rhscon_agent-0.0.18-2redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-10-04 06:59:39 UTC
Embargoed:

Attachments	(Terms of Use)
Host page: Network utilization vs Network Throughput (143.81 KB, image/png) 2016-08-05 12:00 UTC, Daniel Horák	no flags	Details
Network trafic measured by `nload` (39.53 KB, image/png) 2016-08-05 12:02 UTC, Daniel Horák	no flags	Details
network throughput (63.60 KB, image/png) 2016-08-09 13:33 UTC, Martin Kudlej	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Gerrithub.io	286478	None	None	None	2016-08-08 03:44:13 UTC
Red Hat Bugzilla	1338692	unspecified	CLOSED	[Doc RFE] Hosts Administration	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1365578	unspecified	CLOSED	network utilization charts are not shown on host dashboard page (virtual machines only)	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1365989	unspecified	CLOSED	[RFE]calculate network throughput from interfaces used by ceph	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1365995	unspecified	CLOSED	[RFE]dynamically change units in some graphs	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1366242	unspecified	CLOSED	Network utilization is not calculated properly	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHEA-2016:1754	normal	SHIPPED_LIVE	New packages: Red Hat Storage Console 2.0	2017-04-18 19:09:06 UTC

Internal Links: 1338692 1365578 1365989 1365995 1366242

Description Daniel Horák 2016-08-05 12:00:59 UTC

Created attachment 1187860 [details]
Host page: Network utilization vs Network Throughput

Description of problem:
  It seems like the Network utilization chart doesn't work properly.

  I have HW cluster with 3 MON and 4 OSD nodes with two configured networks (1G and 10G).

  And I utilize the network via iperf command (`iperf -s` on first OSD node and `iperf -c 192.168.100.101 --time 36000` on the second OSD node (192.168.100.101 is the IP address of the 10G network interface on the first OSD node).

  Command `nload` on the interface p2p1 (192.168.100.102) shows following values:
    Curr: 4.65 GBit/s
    Avg: 4.65 GBit/s
    Min: 4.64 GBit/s
    Max: 4.65 GBit/s
    Ttl: 11045.28 GByte

  And it is wisible also in the Host -> Performance - Network Throughput chart in USM.
  
  But Network utilization section says for all values zero.

Version-Release number of selected component (if applicable):
  USM Server (RHEL 7.2):
  ceph-installer-1.0.14-1.el7scon.noarch
  libcollection-0.6.2-25.el7.x86_64
  ceph-ansible-1.0.5-32.el7scon.noarch
  rhscon-core-0.0.39-1.el7scon.x86_64
  rhscon-ui-0.0.51-1.el7scon.noarch
  rhscon-core-selinux-0.0.39-1.el7scon.noarch
  rhscon-ceph-0.0.39-1.el7scon.x86_64

  Ceph OSD/MON node (RHEL 7.2):
  calamari-server-1.4.8-1.el7cp.x86_64
  ceph-base-10.2.2-33.el7cp.x86_64
  ceph-common-10.2.2-33.el7cp.x86_64
  ceph-mon-10.2.2-33.el7cp.x86_64
  ceph-osd-10.2.2-33.el7cp.x86_64
  ceph-selinux-10.2.2-33.el7cp.x86_64
  collectd-ping-5.5.1-1.1.el7.x86_64
  collectd-5.5.1-1.1.el7.x86_64
  libcephfs1-10.2.2-33.el7cp.x86_64
  libcollection-0.6.2-25.el7.x86_64
  python-cephfs-10.2.2-33.el7cp.x86_64
  rhscon-agent-0.0.16-1.el7scon.noarch
  rhscon-core-selinux-0.0.39-1.el7scon.noarch

How reproducible:
  100%

Steps to Reproduce:
1. Utilize network by `iperf -s` on one node and `iperf -c 192.168.100.101 --time 36000` on the second node.

Actual results:
  Network utilization charts shows zeros.

Expected results:
  Network utilization charts shows meaningfull data.

Additional info:
  See the attached screenshots.

Comment 1 Daniel Horák 2016-08-05 12:02:49 UTC

Created attachment 1187861 [details]
Network trafic measured by `nload`

Comment 6 Martin Kudlej 2016-08-09 09:34:55 UTC

Tested with:
server
ceph-ansible-1.0.5-32.el7scon.noarch
ceph-installer-1.0.14-1.el7scon.noarch
rhscon-ceph-0.0.40-1.el7scon.x86_64
rhscon-core-0.0.41-1.el7scon.x86_64
rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-ui-0.0.52-1.el7scon.noarch
salt-2015.5.5-1.el7.noarch
salt-master-2015.5.5-1.el7.noarch
salt-selinux-0.0.41-1.el7scon.noarch

node
calamari-server-1.4.8-1.el7cp.x86_64
ceph-base-10.2.2-36.el7cp.x86_64
ceph-common-10.2.2-36.el7cp.x86_64
ceph-mon-10.2.2-36.el7cp.x86_64
ceph-selinux-10.2.2-36.el7cp.x86_64
libcephfs1-10.2.2-36.el7cp.x86_64
python-cephfs-10.2.2-36.el7cp.x86_64
rhscon-agent-0.0.18-1.el7scon.noarch
rhscon-core-selinux-0.0.41-1.el7scon.noarch
salt-2015.5.5-1.el7.noarch
salt-minion-2015.5.5-1.el7.noarch
salt-selinux-0.0.41-1.el7scon.noarch

and there are these issues:

1) Host dashboard shows Performance->Throughput in wrong units. Now there is: "312112629.0 KB/s" and it should be "312112629.0 packets/s" because it is interface-rx_tx from Graphite

2) Network->Utilization units are wrong. There is "GB" now and it should be "GB/s".

Comment 7 anmol babu 2016-08-09 13:31:34 UTC

Network throughput is calculated as summation of average of interface rx and average of interface tx across all interfaces of node.

Network Utilization is calculated as summation of rxs and txs of all interfaces in the node divided by summation of bandwidths of all interfaces and then the result multiplied by 100 to get the percentage.

Comment 8 Martin Kudlej 2016-08-09 13:33:40 UTC

Created attachment 1189267 [details]
network throughput

Could you please explain how are related these 2 numbers and how is throughput calculated?

Comment 9 Martin Kudlej 2016-08-09 13:49:28 UTC

(In reply to anmol babu from comment #7)
"average of interface rx and average of interface tx across all interfaces of node" in other words means "packets/s" and not KB/s
For example for this node RX an TX from ifconfig output:
em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.16.157.12  netmask 255.255.248.0  broadcast 10.16.159.255
        inet6 fe80::d6be:d9ff:feb3:8ef0  prefixlen 64  scopeid 0x20<link>
        ether d4:be:d9:b3:8e:f0  txqueuelen 1000  (Ethernet)
---->        RX packets 964147750  bytes 1076760752912 (1002.8 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
---->        TX packets 2685424374  bytes 3982185569935 (3.6 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether d4:be:d9:b3:8e:f2  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em3: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether d4:be:d9:b3:8e:f4  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em4: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether d4:be:d9:b3:8e:f6  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 2608699  bytes 2085705817 (1.9 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2608699  bytes 2085705817 (1.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p1p1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 192.168.100.105  netmask 255.255.255.0  broadcast 192.168.100.255
        inet6 fe80::92e2:baff:fe04:7e80  prefixlen 64  scopeid 0x20<link>
        ether 90:e2:ba:04:7e:80  txqueuelen 1000  (Ethernet)
---->        RX packets 1780672903  bytes 25882228119558 (23.5 TiB)
        RX errors 0  dropped 132320  overruns 0  frame 0
---->        TX packets 2928330198  bytes 18071710203275 (16.4 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p1p2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 90:e2:ba:04:7e:81  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


So please change unit KB/s to packets/s in right graph.

Comment 10 Nishanth Thomas 2016-08-09 14:25:43 UTC

As anmol explained in the call, we are taking octets/sec(octet is nothing but a byte) not packets/s from collectd. So if you change in the UI to packet per sec it won't be correct. 

As discussed in the call we will make two changes in the UI

1) Network utilization  KB/MB/GB per sec
2) Network throughput   KB/MB/GB per sec

Comment 11 Karnan 2016-08-10 08:37:17 UTC

(In reply to Nishanth Thomas from comment #10)
> As anmol explained in the call, we are taking octets/sec(octet is nothing
> but a byte) not packets/s from collectd. So if you change in the UI to
> packet per sec it won't be correct. 
> 
> As discussed in the call we will make two changes in the UI
> 
> 1) Network utilization  KB/MB/GB per sec
> 2) Network throughput   KB/MB/GB per sec

Network throughput is a time series data. we cannot convert KB/MB/GB per sec.
so it will be plotted as B/s

Comment 12 Karnan 2016-08-10 10:03:27 UTC

(In reply to Karnan from comment #11)
> (In reply to Nishanth Thomas from comment #10)
> > As anmol explained in the call, we are taking octets/sec(octet is nothing
> > but a byte) not packets/s from collectd. So if you change in the UI to
> > packet per sec it won't be correct. 
> > 
> > As discussed in the call we will make two changes in the UI
> > 
> > 1) Network utilization  KB/MB/GB per sec
> > 2) Network throughput   KB/MB/GB per sec
> 
> Network throughput is a time series data. we cannot convert KB/MB/GB per sec.
> so it will be plotted as B/s

thresholds are coming as time series data in bytes. At one point it can be in kb and next moment it can be in gb. so dynamically switching units for whole data is not feasible. so, we are sticking with B/s

Comment 13 Nishanth Thomas 2016-08-10 12:46:08 UTC

Moving to ON_QA. FIV  rhscon-ui-0.0.53-1.el7scon

Comment 14 Martin Kudlej 2016-08-10 17:25:22 UTC

Tested with 
rhscon-ui-0.0.53-1.el7scon.noarch.rpm
and there are correct units now.
In "network throughput" graph there are B/s with big number because units cannot be dynamically changed. bug 1365995

Also there is request for documenting how is calculated network throughput https://bugzilla.redhat.com/show_bug.cgi?id=1338692#c5

Also there is bug 1365989 for calculating network throughput only from interfaces related to Ceph.

Comment 15 Nishanth Thomas 2016-08-11 06:20:29 UTC

*** Bug 1366083 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2016-08-23 19:58:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754

Note You need to log in before you can comment on or make changes to this bug.