Bug 1236473 - Hypervisor summary shows incorrect total storage (Ceph)
Summary: Hypervisor summary shows incorrect total storage (Ceph)
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 6.0 (Juno)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 6.0 (Juno)
Assignee: Eoghan Glynn
QA Contact: nlevinki
URL:
Whiteboard:
: 1248720 (view as bug list)
Depends On:
Blocks: 743661 1332165 1336237 1368279 1430245
TreeView+ depends on / blocked
 
Reported: 2015-06-29 07:24 UTC by Martin Schuppert
Modified: 2023-02-22 23:02 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
When using Red Hat Ceph as a back end for ephemeral storage, the Compute service does not calculate the amount of available storage correctly. Specifically, Compute simply adds up the amount of available storage without factoring in replication. This results in grossly overstated available storage, which in turn could cause unexpected storage oversubscription. To determine the correct ephemeral storage capacity, query the Ceph service directly instead.
Clone Of:
: 1332165 (view as bug list)
Environment:
Last Closed: 2016-03-31 12:58:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1387812 0 None None None Never

Description Martin Schuppert 2015-06-29 07:24:59 UTC
Description of problem:

Since using Ceph for ephemeral storage it adds up the ceph storage seen in each storage node rather than just using the real amount of ceph storage.

e.g. in openstack deployment with three controllers and six compute nodes. Storage is done with ceph block storage in a ceph storage cluster. Each OSD node got a dedicated local hard disk of 1TB totaling ceph storage capacity to 2.7 TB. In the dashboard each compute node sees the whole ceph OSD storage as its own storage capacity totaling the overall storage capacity to amount of computes x ceph storage. So instead of 2.7 TB we see 16.3 TB of storage.

Now each compute seems to report free storage capacity based on the whole ceph storage minus the used storage by the running VMs. This means the system in openstack sees much more storage as really exists and allows oversubscribing of storage. 

[root@controller-1 ~(openstack_admin)]# nova hypervisor-stats
+----------------------+--------+
| Property             | Value  |
+----------------------+--------+
| count                | 6      |
| current_workload     | 0      |
| disk_available_least | 16206  |
| free_disk_gb         | 12662  |
| free_ram_mb          | 599679 |
---> | local_gb             | 16722  |
| local_gb_used        | 4060   |
| memory_mb            | 772735 |
| memory_mb_used       | 173056 |
| running_vms          | 23     |
| vcpus                | 220    |
| vcpus_used           | 83     |
+----------------------+--------+

[root@controller-1 ~(openstack_admin)]# ceph df
GLOBAL:
    SIZE      AVAIL     RAW USED     %RAW USED
    2787G     2701G       87739M          3.07
POOLS:
    NAME         ID     USED       %USED     MAX AVAIL     OBJECTS
    data         0           0         0         1348G           0
    metadata     1           0         0         1348G           0
    rbd          2           0         0         1348G           0
    images       3      12214M      0.43         1348G        1534
    volumes      4      33166M      1.16         1348G        7774

This will lead to problems if the ceph fills up but openstack still reports free storage for all/some compute nodes based on the nova audit.

Version-Release number of selected component (if applicable):
openstack-nova-compute-2014.2.3-9.el7ost.noarch

How reproducible:
always

Steps to Reproduce:
1. configure RBD usage as explained in http://ceph.com/docs/master/rbd/rbd-openstack/

Actual results:
disk usage reported by openstack is RBD * number of computes

Expected results:
Max disk usage is what is reported by ceph cluster

Additional info:

upstream bug: https://bugs.launchpad.net/nova/+bug/1387812

Comment 2 Martin Schuppert 2015-06-29 07:36:10 UTC
Related to this is what is being discussed in:

"nova hypervisor-stats shows wrong disk usage with shared storage" [1]

Let me know if I should file a separate BZ for this.

[1] https://bugs.launchpad.net/nova/+bug/1414432

Comment 4 Sahid Ferdjaoui 2015-07-24 13:16:46 UTC
Several attempts to fix this upstream have been done but nothing merged (all abandoned)

Comment 8 Eoghan Glynn 2015-10-09 11:21:17 UTC
Martin:

There is an upstream spec proposed that will help fix this, but it's in the early stages of discussion:

  https://review.openstack.org/225546

The problem is relatively well understood, but it needs a redeisgn of various scheduler aspects to resolve. So while the discussion is currently underway, at best the timeline would be mitaka/OSP9, and possibly later.

Comment 9 Stephen Gordon 2015-11-06 15:20:00 UTC
*** Bug 1248720 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.