Bug 1339540
| Summary: | "nova hypervisor-show" is not considering ceph replica while calculating disk_available_least parameter. | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | VIKRANT <vaggarwa> | |
| Component: | openstack-nova | Assignee: | melanie witt <mwitt> | |
| Status: | CLOSED ERRATA | QA Contact: | Paras Babbar <pbabbar> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.0 (Kilo) | CC: | cshastri, dasmith, eglynn, kchamart, lyarwood, mbooth, mwitt, pbabbar, sbauza, sclewis, sgordon, srevivo, udayendu.kar, vromanso | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | 16.0 (Train on RHEL 8.1) | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | openstack-nova-20.0.1-0.20191025043858.390db63.el8ost | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1761623 1761625 1761627 (view as bug list) | Environment: | ||
| Last Closed: | 2020-02-06 14:37:21 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1761623, 1761625, 1761627 | |||
|
Description
VIKRANT
2016-05-25 09:39:38 UTC
I have been investigating this bug for the past week. First, I was able to reproduce the behavior in devstack with CEPH_REPLICAS=2 and the devstack-plugin-ceph. The disk allocated for ceph was 8G (default created by devstack-plugin-ceph) and the available disk was showing as 15G because of two OSDs. The devstack-plugin-ceph creates one OSD per replica, and this results in two OSDs running on a single 8G disk. Next, I saw the available disk reported by the ceph tools was 15G using "ceph df". And I saw that each OSD was seeing 8G available using "ceph osd df". Nova calls ceph to query the available disk. I dug into the source of the ceph tools and found indeed it uses the sum of each OSD available disk to calculate the total available disk in the cluster, without regard for number of replicas. So the available disk will be the sum of all the values in "ceph osd df". There appeared to be an assumption in ceph that there is one OSD deployed per disk. I looked into the documentation [1] for more information. It says, "Tip Running multiple OSDs on a single disk–irrespective of partitions–is NOT a good idea." Reasoning is explained in the doc. So, if you have deployed ceph with one OSD per disk, the total available disk reported by ceph will be correct. If you have deployed ceph with multiple OSDs per disk, the total available disk reported will be too high depending on how many OSDs you have overlapped per disk. The accuracy is dependent on how ceph has been deployed, which is why I think we can't "fix" this. When ceph is deployed as recommended, the value will be correct. And I think Nova is doing the right thing by querying ceph for the available disk and not doing extra calculations upon it. [1] http://docs.ceph.com/docs/hammer/start/hardware-recommendations/#hard-disk-drives I am using RHOSP 11 in our production setup with 3 ceph nodes having 10HDD on each node. So total we have 30 HDD and each HDD is having one OSD in it:
---
osdmap e617: 30 osds: 30 up, 30 in
flags sortbitwise,require_jewel_osds,recovery_deletes
---
# ceph -s
cluster 7bcf11d2-0bd1-11e8-ad27-80c16e714008
health HEALTH_WARN
too few PGs per OSD (28 < min 30)
monmap e1: 3 mons at {overcloud-controller-0=192.168.25.26:6789/0,overcloud-controller-1=192.168.25.29:6789/0,overcloud-controller-2=192.168.25.21:6789/0}
election epoch 2566, quorum 0,1,2 overcloud-controller-2,overcloud-controller-0,overcloud-controller-1
osdmap e617: 30 osds: 30 up, 30 in
flags sortbitwise,require_jewel_osds,recovery_deletes
pgmap v1636673: 288 pgs, 8 pools, 1546 GB data, 413 kobjects
4603 GB used, 159 TB / 163 TB avail
288 active+clean
client io 1207 kB/s rd, 1465 kB/s wr, 3915 op/s rd, 205 op/s wr
Ceph storage is deployed correctly and working as expected. In storage server I can see the ceph OSD details correctly:
# ceph osd df
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
0 5.45609 1.00000 5587G 7972M 5579G 0.14 0.05 27
3 5.45609 1.00000 5587G 248G 5338G 4.44 1.62 34
6 5.45609 1.00000 5587G 98374M 5490G 1.72 0.63 23
9 5.45609 1.00000 5587G 187G 5399G 3.35 1.22 28
12 5.45609 1.00000 5587G 53283M 5534G 0.93 0.34 28
14 5.45609 1.00000 5587G 202G 5384G 3.62 1.32 30
16 5.45609 1.00000 5587G 187G 5399G 3.35 1.22 31
20 5.45609 1.00000 5587G 136G 5450G 2.44 0.89 25
22 5.45609 1.00000 5587G 184G 5402G 3.31 1.20 35
25 5.45609 1.00000 5587G 232G 5354G 4.16 1.52 27
1 5.45609 1.00000 5587G 185G 5401G 3.32 1.21 23
5 5.45609 1.00000 5587G 266G 5320G 4.78 1.74 25
8 5.45609 1.00000 5587G 57337M 5531G 1.00 0.36 28
11 5.45609 1.00000 5587G 231G 5355G 4.15 1.51 37
15 5.45609 1.00000 5587G 107G 5479G 1.92 0.70 32
17 5.45609 1.00000 5587G 143G 5443G 2.57 0.94 24
21 5.45609 1.00000 5587G 115G 5471G 2.07 0.75 36
24 5.45609 1.00000 5587G 147G 5439G 2.64 0.96 34
27 5.45609 1.00000 5587G 93699M 5495G 1.64 0.60 21
29 5.45609 1.00000 5587G 188G 5398G 3.37 1.23 28
2 5.45609 1.00000 5587G 104G 5482G 1.87 0.68 24
4 5.45609 1.00000 5587G 53441M 5534G 0.93 0.34 31
7 5.45609 1.00000 5587G 152G 5434G 2.74 1.00 22
10 5.45609 1.00000 5587G 93906M 5495G 1.64 0.60 20
13 5.45609 1.00000 5587G 188G 5398G 3.37 1.23 36
18 5.45609 1.00000 5587G 360G 5226G 6.45 2.35 40
19 5.45609 1.00000 5587G 98008M 5491G 1.71 0.62 24
23 5.45609 1.00000 5587G 278G 5308G 4.98 1.81 34
26 5.45609 1.00000 5587G 150G 5436G 2.70 0.98 33
28 5.45609 1.00000 5587G 61283M 5527G 1.07 0.39 24
TOTAL 163T 4603G 159T 2.75
MIN/MAX VAR: 0.05/2.35 STDDEV: 1.40
So its the nova code thats not reporting the total ceph storage correctly. And when some one will login to the main admin tenant of the RHOSP the first glance of the over all resource available/utilization is showing the wrong information. Which is creating a negative impression.
Hence this need a code fix and on bit priority.
Let me know if you need more information from our side, I will he happy to share the info as per your requirement.
Hi Udayendu, Which nova API have you used where you see the incorrect resource available/utilization? There are two APIs where it is possible you are seeing incorrect information: 1. 'nova hypervisor-stats' command. This shows an aggregated view of resource utilization across the entire cluster and is known to be incorrect when using shared storage, such as ceph. (Since it assumes local storage, it simply adds all of the reported storage per compute host together.) 2. 'nova hypervisor-show' command. This shows the resource utilization for one compute host and could be incorrect if ceph has been deployed with > 1 OSD per HDD (which you have mentioned is *not* the case in your deployment). If it is 1. that you are seeing incorrect, unfortunately that is not going to be fixed until the resource providers work upstream addresses shared storage [0]. The shared storage part of the spec has *not* yet been implemented. If it is 2. that you are seeing incorrect (I don't think you should be if you have deployed only one OSD per HDD) then could you please try this patch [1] to see if it helps your issue? [0] https://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/generic-resource-pools.html [1] https://review.openstack.org/#/c/556692 Hi Melanie, So this behavior we are getting in RHOSP 11 horizon dashboard. But when I am trying to run the command as recommended in option [1], I am getting the following out put: # nova hypervisor-stats +----------------------+---------+ | Property | Value | +----------------------+---------+ | count | 6 | | current_workload | 0 | | disk_available_least | 975282 | | free_disk_gb | 995755 | | free_ram_mb | 3989554 | | local_gb | 1005660 | | local_gb_used | 30372 | | memory_mb | 4716594 | | memory_mb_used | 775685 | | running_vms | 62 | | vcpus | 480 | | vcpus_used | 469 | +----------------------+---------+ In my compute nodes only 300GB of HDD if available which is used for OS deployment. Apart from this no local storage is available and all storage are from Ceph. In our ceph we have around 160TB of storage available but when we are looking into the over all storage utilization in the horizon its showing completely wrong information. Let me know if that make some sense. And if you need more information like the screenshots and any backend logs, feel free to let me know. At this point of time its highly confusing and we need a fix to this. On an enterprise grade product this is not good to have. Thanks, --Uday Hi Uday, If you're seeing the behavior in the horizon dashboard, then I think you are seeing the result of the nova API result of the equivalent of the 'nova hypervisor-stats' CLI command. This command is known to be wrong in the case of shared storage. The incorrect value you are seeing is <available storage reported by ceph> * <number of compute hosts> because the logic is just adding available storage per compute host together to get a total. This is because there is currently no way to distinguish local storage from shared storage for each compute host. The ongoing re-design upstream [0] of how resource reporting and consumption works will address the aforementioned design gap eventually, but work is still underway and the shared storage piece has not yet been implemented. When it is implemented, the issue in the horizon dashboard reporting will be fixed. However, the re-design work is not backportable, so the fix will only be available in the newest version of nova when the code finally lands. To be clear, the 'nova hypervisor-stats' command and the horizon dashboard cannot be fixed until the shared storage work in [0] is completed and it is currently underway. The 'nova hypervisor-show' command (where > 1 OSD per HDD and replica size > 1) can be fixed with the patch [1]. [0] https://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/generic-resource-pools.html [1] https://review.openstack.org/#/c/556692 Hi Melanie, As this cant be fixed until [0], we have to wait. Hope soon you will complete that patch and will see a better UI by next rhosp release and even in the upstream. Thanks for working on it. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |