1377269 – crush rule probably ignored

Bug 1377269 - crush rule probably ignored

Summary: crush rule probably ignored

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	2.1
Assignee:	Samuel Just
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1358267
TreeView+	depends on / blocked

Reported:	2016-09-19 10:56 UTC by Filip Balák
Modified:	2022-02-21 18:05 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-20 11:47:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
crush map (4.06 KB, text/plain) 2016-09-19 10:56 UTC, Filip Balák	no flags	Details
View All

Description Filip Balák 2016-09-19 10:56:55 UTC

Created attachment 1202447 [details]
crush map

Description of problem:
There are several crush rule sets. Where all of them have two disks available. Disks are exclusively available only for one of any of those rule sets. Hence default rule set has only two disks. However they are full even when the pool using this rule set has zero objects.

Version-Release number of selected component (if applicable):
libcephfs1-10.2.2-41.el7cp.x86_64
ceph-common-10.2.2-41.el7cp.x86_64
ceph-selinux-10.2.2-41.el7cp.x86_64
python-cephfs-10.2.2-41.el7cp.x86_64
ceph-base-10.2.2-41.el7cp.x86_64


How reproducible:
80%

Steps to Reproduce:
1. Create cluster using Red Hat Storage Console 2.0. During creation create several new storage profiles. Each of them should have two disks. Leave two disks in default profile.
2. Create one pool for each storage profile with replication. Some of newly created pools have replication number set to 4. All others have replication number set to 2.
3. Start filling pools except the pool using the default profile.
4. Investigate ceph osd status.
E.g. (ceph poolDef is pool with default storage profile)
# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS 
 3 0.04900  1.00000 51173M 41606M  9567M 81.30 1.23 128 
10 0.04900  1.00000 51173M 41605M  9568M 81.30 1.23 128 
 4 0.04900  1.00000 51173M 45097M  6076M 88.12 1.33 128 
 6 0.04900  1.00000 51173M 45096M  6077M 88.12 1.33 128 
 1 0.04900  1.00000 51173M 41000M 10173M 80.12 1.21 128 
11 0.04900  1.00000 51173M 41000M 10173M 80.12 1.21 128 
 5 0.04900  1.00000 51173M  7185M 43988M 14.04 0.21 128 
 8 0.04900  1.00000 51173M  7186M 43987M 14.04 0.21 128 
 2 0.04900  1.00000 51173M 19516M 31657M 38.14 0.58 128 
 9 0.04900  1.00000 51173M 19516M 31657M 38.14 0.58 128 
 0 0.04900  1.00000 51173M 48754M  2419M 95.27 1.44 384 
 7 0.04900  1.00000 51173M 48756M  2417M 95.28 1.44 384 
              TOTAL   599G   396G   202G 66.17          
MIN/MAX VAR: 0.21/1.44  STDDEV: 29.60

# ceph df
GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED 
    599G      202G         396G         66.17 
POOLS:
    NAME        ID     USED       %USED     MAX AVAIL     OBJECTS 
    pool1       1      19476M     38.09        31657M          20 
    pool2       2       7144M     24.52        21993M           7 
    pool3       3      40960M     80.10        10173M          40 
    pool4       4      45056M     88.12         6076M          44 
    pool5       5      41564M     89.68         4783M          41 
    poolDef     6           0         0         2417M           0 

# ceph osd tree
ID  WEIGHT  TYPE NAME                     UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-18 0.09799 root profil5                                                    
-16 0.04900     host dhcp-126-125-profil5                                   
  3 0.04900         osd.3                      up  1.00000          1.00000 
-17 0.04900     host dhcp-126-126-profil5                                   
 10 0.04900         osd.10                     up  1.00000          1.00000 
-15 0.09799 root profil4                                                    
-13 0.04900     host dhcp-126-125-profil4                                   
  4 0.04900         osd.4                      up  1.00000          1.00000 
-14 0.04900     host dhcp-126-126-profil4                                   
  6 0.04900         osd.6                      up  1.00000          1.00000 
-12 0.09799 root profil3                                                    
-10 0.04900     host dhcp-126-125-profil3                                   
  1 0.04900         osd.1                      up  1.00000          1.00000 
-11 0.04900     host dhcp-126-126-profil3                                   
 11 0.04900         osd.11                     up  1.00000          1.00000 
 -9 0.09799 root profil2                                                    
 -7 0.04900     host dhcp-126-125-profil2                                   
  5 0.04900         osd.5                      up  1.00000          1.00000 
 -8 0.04900     host dhcp-126-126-profil2                                   
  8 0.04900         osd.8                      up  1.00000          1.00000 
 -6 0.09799 root profil1                                                    
 -4 0.04900     host dhcp-126-125-profil1                                   
  2 0.04900         osd.2                      up  1.00000          1.00000 
 -5 0.04900     host dhcp-126-126-profil1                                   
  9 0.04900         osd.9                      up  1.00000          1.00000 
 -1 0.09799 root default                                                    
 -2 0.04900     host dhcp-126-125                                           
  0 0.04900         osd.0                      up  1.00000          1.00000 
 -3 0.04900     host dhcp-126-126                                           
  7 0.04900         osd.7                      up  1.00000          1.00000 

### all other pools have different ruleset ###
# ceph osd pool get poolDef crush_ruleset
crush_ruleset: 0

# ceph osd crush rule dump replicated_ruleset
{
    "rule_id": 0,
    "rule_name": "replicated_ruleset",
    "ruleset": 0,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}


Actual results:
Even those two disks that are part of the default storage profile are filled.

Expected results:
No other than proper disks are filled.

Additional info:

Comment 15 Samuel Just 2016-09-19 13:40:58 UTC

This is correct (if weird) behavior.  The rulesets are created correctly, but size is set to 4, which is more than the rulesets can actually map.  It seems that the pools were originally created with the default ruleset, and then changed to use the custom rulesets.  For some of the pgs in the size=4 pools, the primary kept the old 2 osds mapped as well from the default pool since it couldn't delete them without going clean first (which it can't do, since it doesn't have 4 osds...).  This is odd behavior, but it's more understandable if you imagine that the pool had a bunch of data in it already.  Users would be unhappy if we deleted the old copies before we it replicated over to 4 new osds.

Note You need to log in before you can comment on or make changes to this bug.