Bug 1339540

Summary: "nova hypervisor-show" is not considering ceph replica while calculating disk_available_least parameter.
Product: Red Hat OpenStack Reporter: VIKRANT <vaggarwa>
Component: openstack-novaAssignee: melanie witt <mwitt>
Status: CLOSED ERRATA QA Contact: Paras Babbar <pbabbar>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: cshastri, dasmith, eglynn, kchamart, lyarwood, mbooth, mwitt, pbabbar, sbauza, sclewis, sgordon, srevivo, udayendu.kar, vromanso
Target Milestone: rcKeywords: Triaged
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-nova-20.0.1-0.20191025043858.390db63.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1761623 1761625 1761627 (view as bug list) Environment:
Last Closed: 2020-02-06 14:37:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1761623, 1761625, 1761627    

Description VIKRANT 2016-05-25 09:39:38 UTC
Description of problem:
Horizon dashboard is using "nova hypervisor-stats" command to display the Hypervisor statistics. "nova hypervisor-show" is not considering ceph replica calculations while calculating disk_available_least parameter. 

In this example each of ceph node is having 54 GB of OSD disks assigned and replica count is set to 3. Hence the usable space becomes 54GB. But in output we are seeing space of 161 GB which is roughly equivalent to (54*3 = 162 GB)

nova hypervisor-show overcloud-compute-1.localdomain | grep "disk_available_least"
| disk_available_least      | 161                                      |


Version-Release number of selected component (if applicable):
RHEL OSP 7

How reproducible:
Everytime.

Steps to Reproduce:
1. Configure ceph as backend for nova. 
2. Check the Hypervisor summary in Horizon dashboard or in nova hypervisor commands. 
3.

Actual results:
It's showing wrong Space in command outputs. 

Expected results:
It should show only 54GB as usage space. 

Additional info:

Comment 4 melanie witt 2016-06-06 23:35:44 UTC
I have been investigating this bug for the past week.

First, I was able to reproduce the behavior in devstack with CEPH_REPLICAS=2 and the devstack-plugin-ceph. The disk allocated for ceph was 8G (default created by devstack-plugin-ceph) and the available disk was showing as 15G because of two OSDs. The devstack-plugin-ceph creates one OSD per replica, and this results in two OSDs running on a single 8G disk.

Next, I saw the available disk reported by the ceph tools was 15G using "ceph df". And I saw that each OSD was seeing 8G available using "ceph osd df". Nova calls ceph to query the available disk.

I dug into the source of the ceph tools and found indeed it uses the sum of each OSD available disk to calculate the total available disk in the cluster, without regard for number of replicas. So the available disk will be the sum of all the values in "ceph osd df".

There appeared to be an assumption in ceph that there is one OSD deployed per disk. I looked into the documentation [1] for more information. It says, "Tip Running multiple OSDs on a single disk–irrespective of partitions–is NOT a good idea." Reasoning is explained in the doc.

So, if you have deployed ceph with one OSD per disk, the total available disk reported by ceph will be correct. If you have deployed ceph with multiple OSDs per disk, the total available disk reported will be too high depending on how many OSDs you have overlapped per disk.

The accuracy is dependent on how ceph has been deployed, which is why I think we can't "fix" this. When ceph is deployed as recommended, the value will be correct. And I think Nova is doing the right thing by querying ceph for the available disk and not doing extra calculations upon it.

[1] http://docs.ceph.com/docs/hammer/start/hardware-recommendations/#hard-disk-drives

Comment 16 Udayendu Kar 2018-03-17 06:09:40 UTC
I am using RHOSP 11 in our production setup with 3 ceph nodes having 10HDD on each node. So total we have 30 HDD and each HDD is having one OSD in it:

---
osdmap e617: 30 osds: 30 up, 30 in
            flags sortbitwise,require_jewel_osds,recovery_deletes
---            

# ceph -s
    cluster 7bcf11d2-0bd1-11e8-ad27-80c16e714008
     health HEALTH_WARN
            too few PGs per OSD (28 < min 30)
     monmap e1: 3 mons at {overcloud-controller-0=192.168.25.26:6789/0,overcloud-controller-1=192.168.25.29:6789/0,overcloud-controller-2=192.168.25.21:6789/0}
            election epoch 2566, quorum 0,1,2 overcloud-controller-2,overcloud-controller-0,overcloud-controller-1
     osdmap e617: 30 osds: 30 up, 30 in
            flags sortbitwise,require_jewel_osds,recovery_deletes
      pgmap v1636673: 288 pgs, 8 pools, 1546 GB data, 413 kobjects
            4603 GB used, 159 TB / 163 TB avail
                 288 active+clean
  client io 1207 kB/s rd, 1465 kB/s wr, 3915 op/s rd, 205 op/s wr


Ceph storage is deployed correctly and working as expected. In storage server I can see the ceph OSD details correctly:

# ceph osd df
ID WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE VAR  PGS
 0 5.45609  1.00000 5587G  7972M 5579G 0.14 0.05  27
 3 5.45609  1.00000 5587G   248G 5338G 4.44 1.62  34
 6 5.45609  1.00000 5587G 98374M 5490G 1.72 0.63  23
 9 5.45609  1.00000 5587G   187G 5399G 3.35 1.22  28
12 5.45609  1.00000 5587G 53283M 5534G 0.93 0.34  28
14 5.45609  1.00000 5587G   202G 5384G 3.62 1.32  30
16 5.45609  1.00000 5587G   187G 5399G 3.35 1.22  31
20 5.45609  1.00000 5587G   136G 5450G 2.44 0.89  25
22 5.45609  1.00000 5587G   184G 5402G 3.31 1.20  35
25 5.45609  1.00000 5587G   232G 5354G 4.16 1.52  27
 1 5.45609  1.00000 5587G   185G 5401G 3.32 1.21  23
 5 5.45609  1.00000 5587G   266G 5320G 4.78 1.74  25
 8 5.45609  1.00000 5587G 57337M 5531G 1.00 0.36  28
11 5.45609  1.00000 5587G   231G 5355G 4.15 1.51  37
15 5.45609  1.00000 5587G   107G 5479G 1.92 0.70  32
17 5.45609  1.00000 5587G   143G 5443G 2.57 0.94  24
21 5.45609  1.00000 5587G   115G 5471G 2.07 0.75  36
24 5.45609  1.00000 5587G   147G 5439G 2.64 0.96  34
27 5.45609  1.00000 5587G 93699M 5495G 1.64 0.60  21
29 5.45609  1.00000 5587G   188G 5398G 3.37 1.23  28
 2 5.45609  1.00000 5587G   104G 5482G 1.87 0.68  24
 4 5.45609  1.00000 5587G 53441M 5534G 0.93 0.34  31
 7 5.45609  1.00000 5587G   152G 5434G 2.74 1.00  22
10 5.45609  1.00000 5587G 93906M 5495G 1.64 0.60  20
13 5.45609  1.00000 5587G   188G 5398G 3.37 1.23  36
18 5.45609  1.00000 5587G   360G 5226G 6.45 2.35  40
19 5.45609  1.00000 5587G 98008M 5491G 1.71 0.62  24
23 5.45609  1.00000 5587G   278G 5308G 4.98 1.81  34
26 5.45609  1.00000 5587G   150G 5436G 2.70 0.98  33
28 5.45609  1.00000 5587G 61283M 5527G 1.07 0.39  24
              TOTAL  163T  4603G  159T 2.75
MIN/MAX VAR: 0.05/2.35  STDDEV: 1.40


So its the nova code thats not reporting the total ceph storage correctly. And when some one will login to the main admin tenant of the RHOSP the first glance of the over all resource available/utilization is showing the wrong information. Which is creating a negative impression.

Hence this need a code fix and on bit priority.

Let me know if you need more information from our side, I will he happy to share the info as per your requirement.

Comment 17 melanie witt 2018-03-27 20:40:39 UTC
Hi Udayendu,

Which nova API have you used where you see the incorrect resource available/utilization?

There are two APIs where it is possible you are seeing incorrect information:

  1. 'nova hypervisor-stats' command. This shows an aggregated view of resource utilization across the entire cluster and is known to be incorrect when using shared storage, such as ceph. (Since it assumes local storage, it simply adds all of the reported storage per compute host together.)

  2. 'nova hypervisor-show' command. This shows the resource utilization for one compute host and could be incorrect if ceph has been deployed with > 1 OSD per HDD (which you have mentioned is *not* the case in your deployment).

If it is 1. that you are seeing incorrect, unfortunately that is not going to be fixed until the resource providers work upstream addresses shared storage [0]. The shared storage part of the spec has *not* yet been implemented.

If it is 2. that you are seeing incorrect (I don't think you should be if you have deployed only one OSD per HDD) then could you please try this patch [1] to see if it helps your issue?

[0] https://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/generic-resource-pools.html
[1] https://review.openstack.org/#/c/556692

Comment 18 Udayendu Kar 2018-03-28 07:22:23 UTC
Hi Melanie,

So this behavior we are getting in RHOSP 11 horizon dashboard. But when I am trying to run the command as recommended in option [1], I am getting the following out put:

# nova hypervisor-stats
+----------------------+---------+
| Property             | Value   |
+----------------------+---------+
| count                | 6       |
| current_workload     | 0       |
| disk_available_least | 975282  |
| free_disk_gb         | 995755  |
| free_ram_mb          | 3989554 |
| local_gb             | 1005660 |
| local_gb_used        | 30372   |
| memory_mb            | 4716594 |
| memory_mb_used       | 775685  |
| running_vms          | 62      |
| vcpus                | 480     |
| vcpus_used           | 469     |
+----------------------+---------+


In my compute nodes only 300GB of HDD if available which is used for OS deployment. Apart from this no local storage is available and all storage are from Ceph. In our ceph we have around 160TB of storage available but when we are looking into the over all storage utilization in the horizon its showing completely wrong information.

Let me know if that make some sense. And if you need more information like the screenshots and any backend logs, feel free to let me know.

At this point of time its highly confusing and we need a fix to this. On an enterprise grade product this is not good to have.

Thanks,
--Uday

Comment 19 melanie witt 2018-03-28 23:32:39 UTC
Hi Uday,

If you're seeing the behavior in the horizon dashboard, then I think you are seeing the result of the nova API result of the equivalent of the 'nova hypervisor-stats' CLI command. 

This command is known to be wrong in the case of shared storage. The incorrect value you are seeing is <available storage reported by ceph> * <number of compute hosts> because the logic is just adding available storage per compute host together to get a total. This is because there is currently no way to distinguish local storage from shared storage for each compute host.

The ongoing re-design upstream [0] of how resource reporting and consumption works will address the aforementioned design gap eventually, but work is still underway and the shared storage piece has not yet been implemented. When it is implemented, the issue in the horizon dashboard reporting will be fixed. However, the re-design work is not backportable, so the fix will only be available in the newest version of nova when the code finally lands.

To be clear, the 'nova hypervisor-stats' command and the horizon dashboard cannot be fixed until the shared storage work in [0] is completed and it is currently underway. The 'nova hypervisor-show' command (where > 1 OSD per HDD and replica size > 1) can be fixed with the patch [1].

[0] https://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/generic-resource-pools.html
[1] https://review.openstack.org/#/c/556692

Comment 20 Udayendu Kar 2018-03-30 12:55:37 UTC
Hi Melanie,

As this cant be fixed until [0], we have to wait. Hope soon you will complete that patch and will see a better UI by next rhosp release and even in the upstream.

Thanks for working on it.

Comment 32 errata-xmlrpc 2020-02-06 14:37:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283

Comment 33 Red Hat Bugzilla 2023-09-15 01:24:52 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days