Description of problem: Since using Ceph for ephemeral storage it adds up the ceph storage seen in each storage node rather than just using the real amount of ceph storage. e.g. in openstack deployment with three controllers and six compute nodes. Storage is done with ceph block storage in a ceph storage cluster. Each OSD node got a dedicated local hard disk of 1TB totaling ceph storage capacity to 2.7 TB. In the dashboard each compute node sees the whole ceph OSD storage as its own storage capacity totaling the overall storage capacity to amount of computes x ceph storage. So instead of 2.7 TB we see 16.3 TB of storage. Now each compute seems to report free storage capacity based on the whole ceph storage minus the used storage by the running VMs. This means the system in openstack sees much more storage as really exists and allows oversubscribing of storage. [root@controller-1 ~(openstack_admin)]# nova hypervisor-stats +----------------------+--------+ | Property | Value | +----------------------+--------+ | count | 6 | | current_workload | 0 | | disk_available_least | 16206 | | free_disk_gb | 12662 | | free_ram_mb | 599679 | ---> | local_gb | 16722 | | local_gb_used | 4060 | | memory_mb | 772735 | | memory_mb_used | 173056 | | running_vms | 23 | | vcpus | 220 | | vcpus_used | 83 | +----------------------+--------+ [root@controller-1 ~(openstack_admin)]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 2787G 2701G 87739M 3.07 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS data 0 0 0 1348G 0 metadata 1 0 0 1348G 0 rbd 2 0 0 1348G 0 images 3 12214M 0.43 1348G 1534 volumes 4 33166M 1.16 1348G 7774 This will lead to problems if the ceph fills up but openstack still reports free storage for all/some compute nodes based on the nova audit. Version-Release number of selected component (if applicable): openstack-nova-compute-2014.2.3-9.el7ost.noarch How reproducible: always Steps to Reproduce: 1. configure RBD usage as explained in http://ceph.com/docs/master/rbd/rbd-openstack/ Actual results: disk usage reported by openstack is RBD * number of computes Expected results: Max disk usage is what is reported by ceph cluster Additional info: upstream bug: https://bugs.launchpad.net/nova/+bug/1387812
Related to this is what is being discussed in: "nova hypervisor-stats shows wrong disk usage with shared storage" [1] Let me know if I should file a separate BZ for this. [1] https://bugs.launchpad.net/nova/+bug/1414432
Several attempts to fix this upstream have been done but nothing merged (all abandoned)
Martin: There is an upstream spec proposed that will help fix this, but it's in the early stages of discussion: https://review.openstack.org/225546 The problem is relatively well understood, but it needs a redeisgn of various scheduler aspects to resolve. So while the discussion is currently underway, at best the timeline would be mitaka/OSP9, and possibly later.
*** Bug 1248720 has been marked as a duplicate of this bug. ***