1960612 – Node disk info in overview/details does not account for second drive where /var is located

Bug 1960612 - Node disk info in overview/details does not account for second drive where /var is located

Summary: Node disk info in overview/details does not account for second drive where /v...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Console Kubevirt Plugin
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Yaacov Zamir
QA Contact:	Guohua Ouyang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-14 11:36 UTC by Neil Girard
Modified:	2024-10-01 18:12 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The current prometheus quey take the size of the disk mounted on "/" Consequence: if the machine has more then one disk mounted, any disk not mounted on "/" is ignored for the storage capacity calculation Fix: sum up the sizes of devices mounted at any mount point. Result: the storage capacity shown in OCP UI is similar to the value a user will get using "lsblk" command line tool, and summing the sizes of all mounted file systems.
Clone Of:
Environment:
Last Closed:	2021-07-27 23:08:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Node Overview (42.73 KB, image/png) 2021-05-14 11:36 UTC, Neil Girard	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console pull 8978	0	None	open	Bug 1960612: Make filesystem queries use all devices	2021-05-19 11:39:20 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:08:45 UTC

Description Neil Girard 2021-05-14 11:36:53 UTC

Created attachment 1783174 [details]
Node Overview

Description of problem:
When creating a cluster, the cluster was configured to have two disk drives.  The second was to house /var following the steps at:

https://docs.openshift.com/container-platform/4.6/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-user-infra-machines-advanced_vardisk_installing-platform-agnostic

After installation, we confirmed the drives were created as expected on the masters:

[core@master1 ~]$ lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   41G  0 disk
|-sda1                         8:1    0  384M  0 part /boot
|-sda2                         8:2    0  127M  0 part /boot/efi
|-sda3                         8:3    0    1M  0 part
`-sda4                         8:4    0 40.5G  0 part
  `-coreos-luks-root-nocrypt 253:0    0 40.5G  0 dm   /sysroot
sdb                            8:16   0   40G  0 disk
`-sdb1                         8:17   0   38G  0 part /var

The nodes have a total of 80G of disk configured but the overview/details only shows 40.  Is there a way for this to show the full system capacity on these screens?  Is it expected to?

Version-Release number of selected component (if applicable):
4.6

How reproducible:
always

Steps to Reproduce:
1. create cluster with master / workers with 2 drives with /var being on second disk following link above
2. access console
3. navigate Compute -> Nodes and click on desired node.
4. click on overview  or details

Actual results:
Only the first drive's capacity is shown on the overview/details page of nodes

Expected results:
Show both disk capacities

Additional info:

Comment 1 Yaacov Zamir 2021-05-18 10:21:31 UTC

@sradco hi,

Do you know what is the correct query to get capacity of all discs in a cluster node ?

This is the query used now:
https://github.com/openshift/console/blob/d0f39ed0e674dfc03edf3847898569ee8480f7f0/frontend/packages/metal3-plugin/src/components/baremetal-hosts/dashboard/queries.ts#L34

Comment 2 Yaacov Zamir 2021-05-19 12:06:26 UTC

Ref:
https://bugzilla.redhat.com/show_bug.cgi?id=1909004

Comment 4 Yaacov Zamir 2021-05-19 13:20:33 UTC

Note I: 
bz in comment#2 is not a duplicate, it's a different bug concerning the query used to collect the filesystem data.

In https://bugzilla.redhat.com/show_bug.cgi?id=1909004 it was decided to collect only the size of the disk mounted in "/" 
ant this coused this bug where we only use one disk even it two are used by the node.

Note II:
https://github.com/openshift/console/pull/8978
try to fix that by summing up by device instead of by mount point

Comment 5 Neil Girard 2021-05-19 14:27:18 UTC

Ah, that makes sense.  Thanks!

Comment 6 Neil Girard 2021-05-24 17:20:32 UTC

@yzamir I have tested the query:

sum by (instance) (max by (device, instance) (node_filesystem_size_bytes{device=~"/.*"}))

against my cluster and it returns the cumulative size of both drives.

Comment 8 Guohua Ouyang 2021-06-08 02:52:54 UTC

Verified on master.

On 4.8.0-0.nightly-2021-06-01-043518，the filesystem's utilization of the node is '6.52 GiB available of 39.49 GiB'
On master, the filesystem's utilization of the same node is '71.54 GiB available of 108.2 GiB'

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0    7:0    0   70G  0 loop 
sr0     11:0    1  492K  0 rom  
rbd0   251:0    0   50G  0 disk /var/lib/kubelet/pods/96d5e356-365d-41eb-a3de-0ac143d97e09/volumes/kubernetes.io~csi/pvc-4bf3e5b7-825a-
vda    252:0    0   40G  0 disk 
├─vda1 252:1    0    1M  0 part 
├─vda2 252:2    0  127M  0 part 
├─vda3 252:3    0  384M  0 part /boot
└─vda4 252:4    0 39.5G  0 part /sysroot
vdb    252:16   0   70G  0 disk /var/hpvolumes
vdc    252:32   0   70G  0 disk

Comment 17 errata-xmlrpc 2021-07-27 23:08:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 18 Red Hat Bugzilla 2023-09-15 01:06:36 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.