When using Red Hat Ceph as a back end for ephemeral storage, the Compute service does not calculate the amount of available storage correctly. Specifically, Compute simply adds up the amount of available storage without factoring in replication. This results in grossly overstated available storage, which in turn could cause unexpected storage oversubscription.
To determine the correct ephemeral storage capacity, query the Ceph service directly instead.
DescriptionMartin Schuppert
2015-06-29 07:24:59 UTC
Description of problem:
Since using Ceph for ephemeral storage it adds up the ceph storage seen in each storage node rather than just using the real amount of ceph storage.
e.g. in openstack deployment with three controllers and six compute nodes. Storage is done with ceph block storage in a ceph storage cluster. Each OSD node got a dedicated local hard disk of 1TB totaling ceph storage capacity to 2.7 TB. In the dashboard each compute node sees the whole ceph OSD storage as its own storage capacity totaling the overall storage capacity to amount of computes x ceph storage. So instead of 2.7 TB we see 16.3 TB of storage.
Now each compute seems to report free storage capacity based on the whole ceph storage minus the used storage by the running VMs. This means the system in openstack sees much more storage as really exists and allows oversubscribing of storage.
[root@controller-1 ~(openstack_admin)]# nova hypervisor-stats
+----------------------+--------+
| Property | Value |
+----------------------+--------+
| count | 6 |
| current_workload | 0 |
| disk_available_least | 16206 |
| free_disk_gb | 12662 |
| free_ram_mb | 599679 |
---> | local_gb | 16722 |
| local_gb_used | 4060 |
| memory_mb | 772735 |
| memory_mb_used | 173056 |
| running_vms | 23 |
| vcpus | 220 |
| vcpus_used | 83 |
+----------------------+--------+
[root@controller-1 ~(openstack_admin)]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
2787G 2701G 87739M 3.07
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
data 0 0 0 1348G 0
metadata 1 0 0 1348G 0
rbd 2 0 0 1348G 0
images 3 12214M 0.43 1348G 1534
volumes 4 33166M 1.16 1348G 7774
This will lead to problems if the ceph fills up but openstack still reports free storage for all/some compute nodes based on the nova audit.
Version-Release number of selected component (if applicable):
openstack-nova-compute-2014.2.3-9.el7ost.noarch
How reproducible:
always
Steps to Reproduce:
1. configure RBD usage as explained in http://ceph.com/docs/master/rbd/rbd-openstack/
Actual results:
disk usage reported by openstack is RBD * number of computes
Expected results:
Max disk usage is what is reported by ceph cluster
Additional info:
upstream bug: https://bugs.launchpad.net/nova/+bug/1387812
Related to this is what is being discussed in:
"nova hypervisor-stats shows wrong disk usage with shared storage" [1]
Let me know if I should file a separate BZ for this.
[1] https://bugs.launchpad.net/nova/+bug/1414432
Martin:
There is an upstream spec proposed that will help fix this, but it's in the early stages of discussion:
https://review.openstack.org/225546
The problem is relatively well understood, but it needs a redeisgn of various scheduler aspects to resolve. So while the discussion is currently underway, at best the timeline would be mitaka/OSP9, and possibly later.
Description of problem: Since using Ceph for ephemeral storage it adds up the ceph storage seen in each storage node rather than just using the real amount of ceph storage. e.g. in openstack deployment with three controllers and six compute nodes. Storage is done with ceph block storage in a ceph storage cluster. Each OSD node got a dedicated local hard disk of 1TB totaling ceph storage capacity to 2.7 TB. In the dashboard each compute node sees the whole ceph OSD storage as its own storage capacity totaling the overall storage capacity to amount of computes x ceph storage. So instead of 2.7 TB we see 16.3 TB of storage. Now each compute seems to report free storage capacity based on the whole ceph storage minus the used storage by the running VMs. This means the system in openstack sees much more storage as really exists and allows oversubscribing of storage. [root@controller-1 ~(openstack_admin)]# nova hypervisor-stats +----------------------+--------+ | Property | Value | +----------------------+--------+ | count | 6 | | current_workload | 0 | | disk_available_least | 16206 | | free_disk_gb | 12662 | | free_ram_mb | 599679 | ---> | local_gb | 16722 | | local_gb_used | 4060 | | memory_mb | 772735 | | memory_mb_used | 173056 | | running_vms | 23 | | vcpus | 220 | | vcpus_used | 83 | +----------------------+--------+ [root@controller-1 ~(openstack_admin)]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 2787G 2701G 87739M 3.07 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS data 0 0 0 1348G 0 metadata 1 0 0 1348G 0 rbd 2 0 0 1348G 0 images 3 12214M 0.43 1348G 1534 volumes 4 33166M 1.16 1348G 7774 This will lead to problems if the ceph fills up but openstack still reports free storage for all/some compute nodes based on the nova audit. Version-Release number of selected component (if applicable): openstack-nova-compute-2014.2.3-9.el7ost.noarch How reproducible: always Steps to Reproduce: 1. configure RBD usage as explained in http://ceph.com/docs/master/rbd/rbd-openstack/ Actual results: disk usage reported by openstack is RBD * number of computes Expected results: Max disk usage is what is reported by ceph cluster Additional info: upstream bug: https://bugs.launchpad.net/nova/+bug/1387812