Bug 1960612 - Node disk info in overview/details does not account for second drive where /var is located [NEEDINFO]
Summary: Node disk info in overview/details does not account for second drive where /v...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Console Kubevirt Plugin
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Yaacov Zamir
QA Contact: Guohua Ouyang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-14 11:36 UTC by Neil Girard
Modified: 2021-07-27 23:08 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The current prometheus quey take the size of the disk mounted on "/" Consequence: if the machine has more then one disk mounted, any disk not mounted on "/" is ignored for the storage capacity calculation Fix: sum up the sizes of devices mounted at any mount point. Result: the storage capacity shown in OCP UI is similar to the value a user will get using "lsblk" command line tool, and summing the sizes of all mounted file systems.
Clone Of:
Environment:
Last Closed: 2021-07-27 23:08:28 UTC
Target Upstream Version:
yzamir: needinfo? (tnisan)


Attachments (Terms of Use)
Node Overview (42.73 KB, image/png)
2021-05-14 11:36 UTC, Neil Girard
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift console pull 8978 0 None open Bug 1960612: Make filesystem queries use all devices 2021-05-19 11:39:20 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:08:45 UTC

Description Neil Girard 2021-05-14 11:36:53 UTC
Created attachment 1783174 [details]
Node Overview

Description of problem:
When creating a cluster, the cluster was configured to have two disk drives.  The second was to house /var following the steps at:

https://docs.openshift.com/container-platform/4.6/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-user-infra-machines-advanced_vardisk_installing-platform-agnostic

After installation, we confirmed the drives were created as expected on the masters:

[core@master1 ~]$ lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   41G  0 disk
|-sda1                         8:1    0  384M  0 part /boot
|-sda2                         8:2    0  127M  0 part /boot/efi
|-sda3                         8:3    0    1M  0 part
`-sda4                         8:4    0 40.5G  0 part
  `-coreos-luks-root-nocrypt 253:0    0 40.5G  0 dm   /sysroot
sdb                            8:16   0   40G  0 disk
`-sdb1                         8:17   0   38G  0 part /var

The nodes have a total of 80G of disk configured but the overview/details only shows 40.  Is there a way for this to show the full system capacity on these screens?  Is it expected to?

Version-Release number of selected component (if applicable):
4.6

How reproducible:
always

Steps to Reproduce:
1. create cluster with master / workers with 2 drives with /var being on second disk following link above
2. access console
3. navigate Compute -> Nodes and click on desired node.
4. click on overview  or details

Actual results:
Only the first drive's capacity is shown on the overview/details page of nodes

Expected results:
Show both disk capacities

Additional info:

Comment 1 Yaacov Zamir 2021-05-18 10:21:31 UTC
@sradco@redhat.com hi,

Do you know what is the correct query to get capacity of all discs in a cluster node ?

This is the query used now:
https://github.com/openshift/console/blob/d0f39ed0e674dfc03edf3847898569ee8480f7f0/frontend/packages/metal3-plugin/src/components/baremetal-hosts/dashboard/queries.ts#L34

Comment 2 Yaacov Zamir 2021-05-19 12:06:26 UTC
Ref:
https://bugzilla.redhat.com/show_bug.cgi?id=1909004

Comment 4 Yaacov Zamir 2021-05-19 13:20:33 UTC
Note I: 
bz in comment#2 is not a duplicate, it's a different bug concerning the query used to collect the filesystem data.

In https://bugzilla.redhat.com/show_bug.cgi?id=1909004 it was decided to collect only the size of the disk mounted in "/" 
ant this coused this bug where we only use one disk even it two are used by the node.

Note II:
https://github.com/openshift/console/pull/8978
try to fix that by summing up by device instead of by mount point

Comment 5 Neil Girard 2021-05-19 14:27:18 UTC
Ah, that makes sense.  Thanks!

Comment 6 Neil Girard 2021-05-24 17:20:32 UTC
@yzamir@redhat.com I have tested the query:

sum by (instance) (max by (device, instance) (node_filesystem_size_bytes{device=~"/.*"}))

against my cluster and it returns the cumulative size of both drives.

Comment 8 Guohua Ouyang 2021-06-08 02:52:54 UTC
Verified on master.

On 4.8.0-0.nightly-2021-06-01-043518,the filesystem's utilization of the node is '6.52 GiB available of 39.49 GiB'
On master, the filesystem's utilization of the same node is '71.54 GiB available of 108.2 GiB'

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0    7:0    0   70G  0 loop 
sr0     11:0    1  492K  0 rom  
rbd0   251:0    0   50G  0 disk /var/lib/kubelet/pods/96d5e356-365d-41eb-a3de-0ac143d97e09/volumes/kubernetes.io~csi/pvc-4bf3e5b7-825a-
vda    252:0    0   40G  0 disk 
├─vda1 252:1    0    1M  0 part 
├─vda2 252:2    0  127M  0 part 
├─vda3 252:3    0  384M  0 part /boot
└─vda4 252:4    0 39.5G  0 part /sysroot
vdb    252:16   0   70G  0 disk /var/hpvolumes
vdc    252:32   0   70G  0 disk

Comment 17 errata-xmlrpc 2021-07-27 23:08:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.