Description of problem: When adding new OSD's to a cluster using Ceph Deploy with 'osd_crush_initial_weight = 0' set. The output of 'ceph df' reports 'MAX AVAIL' to be 0 instead of the proper value until the weight is changed to 0.01, then ceph df displays proper numerical values. This causes problems for OpenStack Cinder in Kilo because it thinks there isn't any available space for new volumes. Before adding an OSD: GLOBAL: SIZE AVAIL RAW USED %RAW USED 589T 345T 243T 41.32 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS data 0 816M 0 102210G 376 metadata 1 120M 0 102210G 94 images 5 11990G 1.99 68140G 1536075 volumes 6 63603G 10.54 68140G 16462022 instances 8 5657G 0.94 68140G 1063602 rbench 12 260M 0 68140G 22569 scratch 13 40960 0 68140G 10 After adding an OSD: GLOBAL: SIZE AVAIL RAW USED %RAW USED 590T 346T 243T 41.24 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS data 0 816M 0 0 376 metadata 1 120M 0 0 94 images 5 11990G 1.98 0 1536075 volumes 6 63603G 10.52 0 16462022 instances 8 5657G 0.94 0 1063602 rbench 12 260M 0 0 22569 scratch 13 40960 0 0 10 Max Avail is showing 0's for all pools. Version-Release number of selected component (if applicable): RHCS 1.3.1 RHEL 7.2 How reproducible: Highly reproducible, I was able to reproduce on my cluster without any issues. Steps to Reproduce: 2 separate networks cluster and public 1 admin deploy node 3 Mon nodes 3 OSD nodes - 3 OSD disk 3 separate journals Running RHCS 1.3.1 and RHEL 7.2 I have edited my ceph conf to include: osd_crush_initial_weight = 0 Action plan: check ceph df - verfiy max avail shows expected value. Add 1 additional OSD node with 3 OSD's and 3 separate journals taking the initial OSD CRUSH weight of 0 from the conf check ceph df - verify issue is seen with max avail at 0 If issue is seen, change osd CRUSH weight to 0.01 and recheck that max avail shows proper numerical values Actual results: Max Avail shows 0 Expected results: Max Avail to show proper numerical counts from the pools Additional info: Upstream Ceph tracker open as issue is also seen on 94.5 http://tracker.ceph.com/issues/14710
Assigned bug to kchai per sjust request.
Better representation of test from 'ceph df' output: Before adding an OSD: GLOBAL: SIZE AVAIL RAW USED %RAW USED 589T 345T 243T 41.32 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS data 0 816M 0 102210G 376 metadata 1 120M 0 102210G 94 images 5 11990G 1.99 68140G 1536075 volumes 6 63603G 10.54 68140G 16462022 instances 8 5657G 0.94 68140G 1063602 rbench 12 260M 0 68140G 22569 scratch 13 40960 0 68140G 10 After adding an OSD: GLOBAL: SIZE AVAIL RAW USED %RAW USED 590T 346T 243T 41.24 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS data 0 816M 0 0 376 metadata 1 120M 0 0 94 images 5 11990G 1.98 0 1536075 volumes 6 63603G 10.52 0 16462022 instances 8 5657G 0.94 0 1063602 rbench 12 260M 0 0 22569 scratch 13 40960 0 0 10
merged into hammer https://github.com/ceph/ceph/pull/6834.
Fix in 0.94.7 -
Verified adding OSD to cluster having data & Journal partition on same disk and different disks, "ceph df" command has "MAX AVAIL" data. Output captured for reference. [root@host1]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 10.79991 root default -2 2.69998 host magna105 0 0.89999 osd.0 up 1.00000 1.00000 1 0.89999 osd.1 up 1.00000 1.00000 2 0.89999 osd.2 up 1.00000 1.00000 -3 2.69998 host magna107 3 0.89999 osd.3 up 1.00000 1.00000 4 0.89999 osd.4 up 1.00000 1.00000 5 0.89999 osd.5 up 1.00000 1.00000 -4 2.69998 host magna108 6 0.89999 osd.6 up 1.00000 1.00000 7 0.89999 osd.7 up 1.00000 1.00000 8 0.89999 osd.8 up 1.00000 1.00000 -5 2.69997 host magna109 10 0.89999 osd.10 up 1.00000 1.00000 9 0.89999 osd.9 up 1.00000 1.00000 11 0.89998 osd.11 up 1.00000 1.00000 -6 0 host magna110 12 0 osd.12 up 1.00000 1.00000 [root@host1]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 12043G 11947G 98290M 0.80 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 7173M 0.17 3655G 1836528 pool1 1 2456M 0.06 3655G 628740 pool2 2 2556M 0.06 3655G 654449 [root@host1]# ceph -v ceph version 0.94.9-1.el7cp (72b3e852266cea8a99b982f7aa3dde8ca6b48bd3)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-1972.html