Bug 1329190

Summary: Erasure coded pool creation succeeds in backend but request/{id} returns an error
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Shubhendu Tripathi <shtripat>
Component: CalamariAssignee: Christina Meno <gmeno>
Calamari sub component: Back-end QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: ceph-eng-bugs, gmeno, hnallurv, japplewh, kdreyer, nthomas, shtripat
Version: 2.0Keywords: Reopened
Target Milestone: rc   
Target Release: 2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: calamari-server-1.4.4-1.el7cp Ubuntu: calamari_1.4.4-2redhat1xenial Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:36:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1335525, 1339480, 1343229    
Attachments:
Description Flags
crush map none

Description Shubhendu Tripathi 2016-04-21 11:29:43 UTC
Description of problem:
When we invoke a erasure coded pool creation api from calamari, the command actually is successful and pool is created in backend.

But while checking the status of the async task using the api /api/v2/request/{id}, it shows the error for success case as well.

Version-Release number of selected component (if applicable):
calamari-server-1.4.0-0.5.rc8.el7cp.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a ceph cluster
2. Submit a erasure pool creation request with below payload

    {
        "name": "ecpool", 
        "size": 3, 
        "pg_num": 128, 
        "crush_ruleset": 1, 
        "pgp_num": 128, 
        "quota_max_objects": 0, 
        "quota_max_bytes": 0, 
        "type": "erasure", 
        "erasure_code_profile": "default"
    }

3. Check the status of the task created using api /api/v2/request/{id}

Actual results:
It shows the below output with error

{
    "id": "150fcecf-45eb-410f-a0da-5f1b6a518004", 
    "state": "complete", 
    "error": true, 
    "error_message": "can not change the size of an erasure-coded pool", 
    "headline": "Creating pool 'ecpool'", 
    "status": "Failed (can not change the size of an erasure-coded pool)", 
    "requested_at": "2016-04-21T10:59:47.500948+00:00", 
    "completed_at": "2016-04-21T10:59:48.224439+00:00"
}

whereas the pool is actually created in backend. The output from /api/v2/cluster/{fsid}/pool lists the pool as below

    {
        "name": "ecpool", 
        "id": 6, 
        "size": 3, 
        "pg_num": 128, 
        "crush_ruleset": 1, 
        "min_size": 2, 
        "crash_replay_interval": 0, 
        "pgp_num": 128, 
        "hashpspool": true, 
        "full": false, 
        "quota_max_objects": 0, 
        "quota_max_bytes": 0, 
        "type": "erasure", 
        "erasure_code_profile": "default"
    }

Expected results:
In case of successful creation of ec-pool task status should not report error

Additional info:

Comment 2 Christina Meno 2016-04-21 17:14:42 UTC
Thank you for the great bug report. I appreciate the steps to reproduce.

Looks like calamari is not validating the set of fields that are applicable to EC pool creation. The reason it is succeeding in part is because pool creation is a two-step process in calamari.

1. we create the pool with the fields that must be present.
2. we set any properties that are optional and have defaults

so you are seeing the failures of step two. I propose we put it in 2.1 because it's not critical IMHO as I understand it size is not applicable to ec pool because that is controlled by the erasure code profile.

Can you ignore this specific error in the 2.0 release?

Comment 3 Shubhendu Tripathi 2016-04-21 17:46:42 UTC
Gregory, 

Issue is we have generic code which looks for the task output and if there are errMsg poipulated we just return the same error to calling code.

Now this generic code gets invoked for any POST call to calamari, so I feel certainly its not good to ignore as it could be a genuine error in some other case and we would miss the error.

Comment 4 Jeff Applewhite 2016-04-22 15:50:28 UTC
What types of properties are optional as in Gregory's step 2?

Comment 5 Shubhendu Tripathi 2016-04-22 17:00:32 UTC
@Jeff, size is optional field foe ec pool creation. But error message we cannot ignore as the same function is used for all other POST requests.

Comment 6 Jeff Applewhite 2016-04-22 17:36:56 UTC
@Shubendu yes I agree - I was just trying to understand the flow

Comment 7 Ken Dreyer (Red Hat) 2016-05-10 02:33:58 UTC
What is the next step on this BZ?

Comment 8 Christina Meno 2016-05-17 19:42:26 UTC
I need to make a new validating serializer for erasure coded pools here:
https://github.com/ceph/calamari/blob/1.4/rest-api/calamari_rest/serializers/v2.py#L75

Comment 9 Christina Meno 2016-05-19 17:12:28 UTC
In progress upstream, expect to finish by May20th.

Comment 10 Christina Meno 2016-05-23 23:22:53 UTC
https://github.com/ceph/calamari/pull/451

as soon as the tests pass I'll merge and tag ~ 30min

Comment 11 Christina Meno 2016-05-24 00:04:26 UTC
tagged as v1.4.0-rc13
Would you please build this downstream?

Comment 12 Ken Dreyer (Red Hat) 2016-05-24 00:08:18 UTC
I will do that first thing tomorrow.

Comment 15 Shubhendu Tripathi 2016-05-30 10:09:35 UTC
With this fix, do we need NOT pass "size" parameter for ec-pool creation?

With existing code in USM (which passes size parameter), now ec-ppol creation fails with "400 BAD REQUEST".

Tweaked USM code a bit to not to pass size parameter but this returns with "500
INTERNAL SERVER ERROR".

Kindly suggest...

Comment 16 Christina Meno 2016-06-02 05:36:56 UTC
looks like a bug.
I've got a fix https://github.com/ceph/calamari/pull/456 upstream.

Comment 17 Ken Dreyer (Red Hat) 2016-06-14 16:23:45 UTC
What version of Calamari will contain this change?

Comment 18 Christina Meno 2016-06-14 17:45:25 UTC
v.1.4.1

Comment 20 Shubhendu Tripathi 2016-06-17 12:58:02 UTC
I modified the USM code to not pass size parameter for EC pools now. But calamari creates the ec pool in backend but the async task for the same still does not report as "completed".

It still says "submitted" as below and so after a min USM times out saying ec pool creation failed

-----------------------
{
    "id": "1a9bfe1e-31d4-44ec-bc6d-7c7b8148acc9", 
    "state": "submitted", 
    "error": false, 
    "error_message": "", 
    "headline": "Creating pool 'ecpool'", 
    "status": "Waiting for OSD map epoch 83", 
    "requested_at": "2016-06-17T12:36:42.714802+00:00", 
    "completed_at": null
}
-----------------------

Comment 21 Christina Meno 2016-06-21 04:23:40 UTC
Shubhendu,

calamari is waiting on several conditions one of which is the osd map of a specific epoch. additionally it waits for all PGs in the resulting pool¬
    to leave state 'creating' before completing.

Would you please share the ceph -w output before and after trying to create this pool ?

Comment 22 Christina Meno 2016-06-21 05:09:05 UTC
Also it would be really helpful to see the CRUSH map of the cluster you're using to create this pool. 

I've tested this out and it works for me.

Comment 23 Shubhendu Tripathi 2016-06-21 05:22:32 UTC
Created attachment 1170062 [details]
crush map

Comment 24 Shubhendu Tripathi 2016-06-21 05:22:49 UTC
Gregory,

I started ceph -w before invoking the erasure pool creation and below is the output while ec pool creation request timed out in USM

=================================================================
[root@dhcp46-204 ~]# ceph -w
    cluster d3e3f8b7-fa1e-48da-adeb-7c76178cc1e7
     health HEALTH_OK
     monmap e1: 1 mons at {dhcp46-204=10.70.46.204:6789/0}
            election epoch 3, quorum 0 dhcp46-204
     osdmap e75: 5 osds: 5 up, 5 in
            flags sortbitwise
      pgmap v243: 1024 pgs, 2 pools, 228 bytes data, 8 objects
            185 MB used, 149 GB / 149 GB avail
                1024 active+clean

2016-06-21 10:31:02.879125 mon.0 [INF] pgmap v244: 1024 pgs: 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail; 0 B/s rd, 0 op/s
2016-06-21 10:31:03.920859 mon.0 [INF] pgmap v245: 1024 pgs: 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail; 0 B/s rd, 0 op/s
2016-06-21 10:31:04.949836 mon.0 [INF] pgmap v246: 1024 pgs: 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail; 7436 B/s rd, 9 op/s
2016-06-21 10:31:07.981476 mon.0 [INF] pgmap v247: 1024 pgs: 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail; 4008 B/s rd, 5 op/s
2016-06-21 10:31:09.000499 mon.0 [INF] pgmap v248: 1024 pgs: 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail; 6569 B/s rd, 8 op/s
2016-06-21 10:31:10.053953 mon.0 [INF] pgmap v249: 1024 pgs: 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail; 6985 B/s rd, 8 op/s
2016-06-21 10:32:18.697233 mon.0 [INF] from='client.? 10.70.46.204:0/4228501899' entity='client.admin' cmd=[{"pool_type": "erasure", "prefix": "osd pool create", "pg_num": 512, "erasure_code_profile": "default", "format": "json", "pool": "ecpool0"}]: dispatch
2016-06-21 10:32:19.141610 mon.0 [INF] osdmap e76: 5 osds: 5 up, 5 in
2016-06-21 10:32:19.165852 mon.0 [INF] pgmap v250: 1024 pgs: 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:19.774692 mon.0 [INF] from='client.? 10.70.46.204:0/4228501899' entity='client.admin' cmd='[{"pool_type": "erasure", "prefix": "osd pool create", "pg_num": 512, "erasure_code_profile": "default", "format": "json", "pool": "ecpool0"}]': finished
2016-06-21 10:32:19.834641 mon.0 [INF] osdmap e77: 5 osds: 5 up, 5 in
2016-06-21 10:32:19.835175 mon.0 [INF] from='client.? 10.70.46.204:0/4228501899' entity='client.admin' cmd=[{"var": "crush_ruleset", "prefix": "osd pool set", "pool": "ecpool0", "val": 1, "format": "json"}]: dispatch
2016-06-21 10:32:19.853713 mon.0 [INF] pgmap v251: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:20.803552 mon.0 [INF] from='client.? 10.70.46.204:0/4228501899' entity='client.admin' cmd='[{"var": "crush_ruleset", "prefix": "osd pool set", "pool": "ecpool0", "val": 1, "format": "json"}]': finished
2016-06-21 10:32:20.823650 mon.0 [INF] osdmap e78: 5 osds: 5 up, 5 in
2016-06-21 10:32:20.824220 mon.0 [INF] from='client.? 10.70.46.204:0/4228501899' entity='client.admin' cmd=[{"field": "max_bytes", "prefix": "osd pool set-quota", "pool": "ecpool0", "val": "0", "format": "json"}]: dispatch
2016-06-21 10:32:20.857157 mon.0 [INF] pgmap v252: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:21.832442 mon.0 [INF] from='client.? 10.70.46.204:0/4228501899' entity='client.admin' cmd='[{"field": "max_bytes", "prefix": "osd pool set-quota", "pool": "ecpool0", "val": "0", "format": "json"}]': finished
2016-06-21 10:32:21.839068 mon.0 [INF] osdmap e79: 5 osds: 5 up, 5 in
2016-06-21 10:32:21.839995 mon.0 [INF] from='client.? 10.70.46.204:0/4228501899' entity='client.admin' cmd=[{"field": "max_objects", "prefix": "osd pool set-quota", "pool": "ecpool0", "val": "0", "format": "json"}]: dispatch
2016-06-21 10:32:21.845359 mon.0 [INF] pgmap v253: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:22.817904 mon.0 [INF] from='client.? 10.70.46.204:0/4228501899' entity='client.admin' cmd='[{"field": "max_objects", "prefix": "osd pool set-quota", "pool": "ecpool0", "val": "0", "format": "json"}]': finished
2016-06-21 10:32:22.848405 mon.0 [INF] osdmap e80: 5 osds: 5 up, 5 in
2016-06-21 10:32:22.888835 mon.0 [INF] pgmap v254: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:23.986663 mon.0 [INF] pgmap v255: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:25.001435 mon.0 [INF] pgmap v256: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 185 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:27.287198 mon.0 [INF] pgmap v257: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 186 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:28.300172 mon.0 [INF] pgmap v258: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 186 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:30.332188 mon.0 [INF] pgmap v259: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 186 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:32.368658 mon.0 [INF] pgmap v260: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 186 MB used, 149 GB / 149 GB avail
2016-06-21 10:32:33.390175 mon.0 [INF] pgmap v261: 1536 pgs: 512 creating, 1024 active+clean; 228 bytes data, 186 MB used, 149 GB / 149 GB avail

=================================================================


Also the request output rom calamari django UI is as below

===========================================================

GET /api/v2/cluster/d3e3f8b7-fa1e-48da-adeb-7c76178cc1e7/request/c0cd4702-e232-4f45-9ea9-0df668a06036
HTTP 200 OK
Vary: Accept
Content-Type: text/html; charset=utf-8
Allow: GET, HEAD, OPTIONS

{
    "id": "c0cd4702-e232-4f45-9ea9-0df668a06036", 
    "state": "submitted", 
    "error": false, 
    "error_message": "", 
    "headline": "Creating pool 'ecpool0'", 
    "status": "Waiting for OSD map epoch 80", 
    "requested_at": "2016-06-21T05:02:18.630201+00:00", 
    "completed_at": null
}

====================================================================

The cluster status is WARNING after executing ec pool creation as shown below

=============================
[root@dhcp46-204 ~]# ceph -s
    cluster d3e3f8b7-fa1e-48da-adeb-7c76178cc1e7
     health HEALTH_ERR
            512 pgs are stuck inactive for more than 300 seconds
            512 pgs stuck inactive
            too many PGs per OSD (307 > max 300)
     monmap e1: 1 mons at {dhcp46-204=10.70.46.204:6789/0}
            election epoch 3, quorum 0 dhcp46-204
     osdmap e80: 5 osds: 5 up, 5 in
            flags sortbitwise
      pgmap v261: 1536 pgs, 3 pools, 228 bytes data, 8 objects
            186 MB used, 149 GB / 149 GB avail
                1024 active+clean
                 512 creating
============================


The crush map for the cluster is attached.

Comment 25 Shubhendu Tripathi 2016-06-21 06:03:37 UTC
As asked, below is the data payload we use for ec pool creation

{
    "name":                 <name of pool>,
    "quota_max_objects":    <int value>,
    "quota_max_bytes":      <int value>,
    "pg_num":               <pgs>,
    "pgp_num":              <pgs>,
    "type":                 "erasure",
    "erasure_code_profile": <EC profile name which we create at cluster creation>,
    "crush_ruleset":        <ruleset no>,
}

Comment 26 Christina Meno 2016-06-22 04:50:31 UTC
[root@dhcp46-204 ~]# crushtool -i crush.map --test --show-bad-mappings    --rule 2   --num-rep 1    --min-x 1 --max-x 10
bad mapping rule 2 x 1 num_rep 1 result [2147483647]
bad mapping rule 2 x 2 num_rep 1 result [2147483647]
bad mapping rule 2 x 3 num_rep 1 result [2147483647]
bad mapping rule 2 x 4 num_rep 1 result [2147483647]
bad mapping rule 2 x 5 num_rep 1 result [2147483647]
bad mapping rule 2 x 6 num_rep 1 result [2147483647]
bad mapping rule 2 x 7 num_rep 1 result [2147483647]
bad mapping rule 2 x 8 num_rep 1 result [2147483647]
bad mapping rule 2 x 9 num_rep 1 result [2147483647]
bad mapping rule 2 x 10 num_rep 1 result [2147483647]

the crushtool is saying that this rule 
rule erasure-code {
	ruleset 2
	type erasure
	min_size 3
	max_size 3
	step set_chooseleaf_tries 5
	step set_choose_tries 100
	step take default
	step chooseleaf indep 0 type host
	step emit
}


won't map to any OSDs

Comment 27 Christina Meno 2016-06-22 05:00:41 UTC
In this particular rule you're starting with the default node

root default {
	id -1		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
	item dhcp46-150 weight 0.000
	item dhcp46-151 weight 0.000
	item dhcp46-27 weight 0.000
	item dhcp46-8 weight 0.000
}

and trying to chose pool_num hosts then get ODS from those subtrees
none of the host nodes listed in default have OSDs so this fails.

This is not a bug with calamari, you've made a crush map that contains some semantic errors. Perhaps in 2.1 calamari could help more here.

What you'd need to do to make this work
you could change the rule erasure-code to start at root node general.

Comment 28 Shubhendu Tripathi 2016-06-22 05:31:11 UTC
@Nishanth, I feel some changes required while creating crush nodes for the cluster.

Comment 29 Nishanth Thomas 2016-06-22 10:07:20 UTC
The problem here is that calamari creating crush rule itself and using it as the rule set for erasure coded pool getting created rather than using the rule set supplied by USM along with request.Why is it done so? Does calamari creates a crush rule per ecpool?  Ideally calamari should use the crush rule provided by USM rather than create its own?

Comment 31 Christina Meno 2016-06-22 18:55:41 UTC
Nishanth and Shubhendu
Please pursue the workaround: 
1. create erasure coded pool and wait for request endpoint to indicate that it's waiting to complete
2. PATCH the pool to update the crush ruleset, creation should succeed.

In the meantime I'll attempt to fix the issue that create won't honor the crushruleset field during pool creation

Comment 32 Christina Meno 2016-06-24 15:56:10 UTC
https://github.com/ceph/calamari/releases/tag/v1.4.4

Comment 38 errata-xmlrpc 2016-08-23 19:36:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html