Bug 1562220

Summary: openstack_config.yml task "assign rbd application to pool" incorrectly assigns "rbd" to metrics, when it should be openstack_gnocchi
Product: Red Hat Ceph Storage Reporter: John Fulton <johfulto>
Component: Ceph-AnsibleAssignee: leseb <shan>
Status: CLOSED ERRATA QA Contact: Rachana Patel <racpatel>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: adeza, aschoen, ceph-eng-bugs, gfidente, gmeno, johfulto, kdreyer, mcornea, nthomas, racpatel, sankarshan, shan, vpoliset, yprokule, yrabl
Target Milestone: rcKeywords: Reopened
Target Release: 3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-3.1.0-0.1.beta9.el7cp Ubuntu: ceph-ansible_3.1.0~rc3-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-26 18:19:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1485415, 1548353, 1561169, 1583333    
Attachments:
Description Flags
ceph-ansible logs
none
Inventory from run ansible-mistral-actionMI1OZq none

Description John Fulton 2018-03-29 19:57:24 UTC
When running rolling_update.yml twice on the same system the playbook fails on the "assign rbd application to pool" task: 

https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/tasks/openstack_config.yml#L17-L22

with the error below [1]. 

Perhaps the when could be expanded to include checking if the pool already has an enabled application? 

Using from ceph-ansible-3.1.0-0.1.beta4.el7cp.noarch with OSP13 beta and Luminous.

[1] 

2018-03-29 14:39:16,353 p=21727 u=mistral |  TASK [ceph-mon : assign rbd application to pool(s)] ****************************
2018-03-29 14:39:16,353 p=21727 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-mon/tasks/openstack_config.yml:17
2018-03-29 14:39:16,353 p=21727 u=mistral |  Thursday 29 March 2018  14:39:16 -0400 (0:00:03.643)       0:02:59.193 ******** 
2018-03-29 14:39:18,114 p=21727 u=mistral |  ok: [192.168.24.18] => (item={u'rule_name': u'', u'pg_num': 32, u'name': u'images'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", 
"--cluster", "ceph", "osd", "pool", "application", "enable", "images", "rbd"], "delta": "0:00:01.461224", "end": "2018-03-29 18:39:18.081302", "item": {"name": "images", "pg_num": 32, "rule_name": ""}, "rc": 0, 
"start": "2018-03-29 18:39:16.620078", "stderr": "enabled application 'rbd' on pool 'images'", "stderr_lines": ["enabled application 'rbd' on pool 'images'"], "stdout": "", "stdout_lines": []}
2018-03-29 14:39:18,813 p=21727 u=mistral |  failed: [192.168.24.18] (item={u'rule_name': u'', u'pg_num': 32, u'name': u'metrics'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "application", "enable", "metrics", "rbd"], "delta": "0:00:00.466702", "end": "2018-03-29 18:39:18.794722", "item": {"name": "metrics", "pg_num": 32, "rule_name": ""}, "msg": "non-zero return code", "rc": 1, "start": "2018-03-29 18:39:18.328020", "stderr": "Error EPERM: Are you SURE? Pool 'metrics' already has an enabled application; pass --yes-i-really-mean-it to proceed anyway", "stderr_lines": ["Error EPERM: Are you SURE? Pool 'metrics' already has an enabled application; pass --yes-i-really-mean-it to proceed anyway"], "stdout": "", "stdout_lines": []}
2018-03-29 14:39:20,115 p=21727 u=mistral |  ok: [192.168.24.18] => (item={u'rule_name': u'', u'pg_num': 32, u'name': u'backups'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "application", "enable", "backups", "rbd"], "delta": "0:00:01.045684", "end": "2018-03-29 18:39:20.086214", "item": {"name": "backups", "pg_num": 32, "rule_name": ""}, "rc": 0, "start": "2018-03-29 18:39:19.040530", "stderr": "enabled application 'rbd' on pool 'backups'", "stderr_lines": ["enabled application 'rbd' on pool 'backups'"], "stdout": "", "stdout_lines": []}
2018-03-29 14:39:21,134 p=21727 u=mistral |  ok: [192.168.24.18] => (item={u'rule_name': u'', u'pg_num': 32, u'name': u'vms'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "application", "enable", "vms", "rbd"], "delta": "0:00:00.653458", "end": "2018-03-29 18:39:21.107759", "item": {"name": "vms", "pg_num": 32, "rule_name": ""}, "rc": 0, "start": "2018-03-29 18:39:20.454301", "stderr": "enabled application 'rbd' on pool 'vms'", "stderr_lines": ["enabled application 'rbd' on pool 'vms'"], "stdout": "", "stdout_lines": []}
2018-03-29 14:39:22,134 p=21727 u=mistral |  ok: [192.168.24.18] => (item={u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "application", "enable", "volumes", "rbd"], "delta": "0:00:00.744918", "end": "2018-03-29 18:39:22.112935", "item": {"name": "volumes", "pg_num": 32, "rule_name": ""}, "rc": 0, "start": "2018-03-29 18:39:21.368017", "stderr": "enabled application 'rbd' on pool 'volumes'", "stderr_lines": ["enabled application 'rbd' on pool 'volumes'"], "stdout": "", "stdout_lines": []}
2018-03-29 14:39:22,140 p=21727 u=mistral |  PLAY RECAP *********************************************************************
2018-03-29 14:39:22,140 p=21727 u=mistral |  192.168.24.10              : ok=65   changed=6    unreachable=0    failed=0   
2018-03-29 14:39:22,140 p=21727 u=mistral |  192.168.24.12              : ok=4    changed=0    unreachable=0    failed=0   
2018-03-29 14:39:22,140 p=21727 u=mistral |  192.168.24.15              : ok=4    changed=0    unreachable=0    failed=0   
2018-03-29 14:39:22,140 p=21727 u=mistral |  192.168.24.18              : ok=68   changed=5    unreachable=0    failed=1   
2018-03-29 14:39:22,140 p=21727 u=mistral |  192.168.24.22              : ok=4    changed=0    unreachable=0    failed=0   
2018-03-29 14:39:22,141 p=21727 u=mistral |  192.168.24.9               : ok=64   changed=7    unreachable=0    failed=0   
2018-03-29 14:39:22,141 p=21727 u=mistral |  localhost                  : ok=0    changed=0    unreachable=0    failed=0   
2018-03-29 14:39:22,141 p=21727 u=mistral |  Thursday 29 March 2018  14:39:22 -0400 (0:00:05.787)       0:03:04.981 ********

Comment 4 leseb 2018-04-12 12:39:40 UTC
Strange I have a different result:

root@ceph-nano-b-faa32aebf00b:/# ceph -v
ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
root@ceph-nano-b-faa32aebf00b:/# rados mkpool leseb 8
setting auid:8
successfully created pool leseb
root@ceph-nano-b-faa32aebf00b:/# ceph osd pool application enable leseb rbd
enabled application 'rbd' on pool 'leseb'
root@ceph-nano-b-faa32aebf00b:/# ceph osd pool application enable leseb rbd
enabled application 'rbd' on pool 'leseb'
root@ceph-nano-b-faa32aebf00b:/# ceph osd pool application enable leseb rbd
enabled application 'rbd' on pool 'leseb'

Comment 5 leseb 2018-04-12 12:41:11 UTC
Oh I got it, the command **is** idempotent, you just tried to assign a different application type, hence the warning.

Feel free to re-open if you have more concern.

Comment 6 John Fulton 2018-04-13 17:04:06 UTC
(In reply to leseb from comment #5)
> Oh I got it, the command **is** idempotent, you just tried to assign a
> different application type, hence the warning.
> 
> Feel free to re-open if you have more concern.

I'm not sure we assigned a different application to it. The tag was already set to rbd as the logs from the earlier run show [1]. 

Maybe something else changed it in between. I suppose more testing could happen to monitor the tag during upgrade to see if something else is setting the tag and if it's found, attribute the bug to that. The alternative is to just code ceph-ansible in a paranoid way and only try to set the tag if it is not already set.

[1] https://bugzilla.redhat.com/attachment.cgi?id=1414867
2018-03-28 23:43:15,191 p=9685 u=mistral |  ok: [192.168.24.18] => (item={u'rule_name': u'', u'pg_num': 32, u'name': u'metrics'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "application", "enable", "metrics", "rbd"], "delta": "0:00:00.738051", "end": "2018-03-29 03:43:15.171144", "item": {"name": "metrics", "pg_num": 32, "rule_name": ""}, "rc": 0, "start": "2018-03-29 03:43:14.433093", "stderr": "enabled application 'rbd' on pool 'metrics'", "stderr_lines": ["enabled application 'rbd' on pool 'metrics'"], "stdout": "", "stdout_lines": []}

Comment 7 Marius Cornea 2018-04-19 01:19:30 UTC
I was able reproduce this issue when trying to re-run the upgrade after a failed attempt:

018-04-18 21:03:56,248 p=16352 u=mistral |  ok: [192.168.24.18] => (item={u'rule_name': u'', u'pg_num': 32, u'name': u'images'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd",
 "pool", "application", "enable", "images", "rbd"], "delta": "0:00:00.885337", "end": "2018-04-19 01:03:56.433874", "item": {"name": "images", "pg_num": 32, "rule_name": ""}, "rc": 0, "start": "2018-04-19 01:03:55.548537", "stderr": "enab
led application 'rbd' on pool 'images'", "stderr_lines": ["enabled application 'rbd' on pool 'images'"], "stdout": "", "stdout_lines": []}
2018-04-18 21:03:56,935 p=16352 u=mistral |  failed: [192.168.24.18] (item={u'rule_name': u'', u'pg_num': 32, u'name': u'metrics'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd
", "pool", "application", "enable", "metrics", "rbd"], "delta": "0:00:00.433200", "end": "2018-04-19 01:03:57.119135", "item": {"name": "metrics", "pg_num": 32, "rule_name": ""}, "msg": "non-zero return code", "rc": 1, "start": "2018-04-1
9 01:03:56.685935", "stderr": "Error EPERM: Are you SURE? Pool 'metrics' already has an enabled application; pass --yes-i-really-mean-it to proceed anyway", "stderr_lines": ["Error EPERM: Are you SURE? Pool 'metrics' already has an enable
d application; pass --yes-i-really-mean-it to proceed anyway"], "stdout": "", "stdout_lines": []}
2018-04-18 21:03:58,245 p=16352 u=mistral |  ok: [192.168.24.18] => (item={u'rule_name': u'', u'pg_num': 32, u'name': u'backups'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd"
, "pool", "application", "enable", "backups", "rbd"], "delta": "0:00:01.068002", "end": "2018-04-19 01:03:58.430988", "item": {"name": "backups", "pg_num": 32, "rule_name": ""}, "rc": 0, "start": "2018-04-19 01:03:57.362986", "stderr": "e
nabled application 'rbd' on pool 'backups'", "stderr_lines": ["enabled application 'rbd' on pool 'backups'"], "stdout": "", "stdout_lines": []}
2018-04-18 21:03:59,232 p=16352 u=mistral |  ok: [192.168.24.18] => (item={u'rule_name': u'', u'pg_num': 32, u'name': u'vms'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "p
ool", "application", "enable", "vms", "rbd"], "delta": "0:00:00.595775", "end": "2018-04-19 01:03:59.416364", "item": {"name": "vms", "pg_num": 32, "rule_name": ""}, "rc": 0, "start": "2018-04-19 01:03:58.820589", "stderr": "enabled appli
cation 'rbd' on pool 'vms'", "stderr_lines": ["enabled application 'rbd' on pool 'vms'"], "stdout": "", "stdout_lines": []}
2018-04-18 21:04:00,255 p=16352 u=mistral |  ok: [192.168.24.18] => (item={u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd"
, "pool", "application", "enable", "volumes", "rbd"], "delta": "0:00:00.793744", "end": "2018-04-19 01:04:00.443578", "item": {"name": "volumes", "pg_num": 32, "rule_name": ""}, "rc": 0, "start": "2018-04-19 01:03:59.649834", "stderr": "e
nabled application 'rbd' on pool 'volumes'", "stderr_lines": ["enabled application 'rbd' on pool 'volumes'"], "stdout": "", "stdout_lines": []}

Comment 8 leseb 2018-04-19 08:44:27 UTC
The error is pretty slef-explanatory, if you get it it's because somehow the assigned pool changed. If the command has the same output all the time as shown in https://bugzilla.redhat.com/show_bug.cgi?id=1562220#c4 there is no problem. Once you try to assign a different application type that this is different from the current one the error occurs.

We need to either, close this BZ and open a new one with the right description/title or update this one's title/description.
But as is both description and title are misleading and wrong.

So please let's do one of the two.

Thanks.

Comment 11 Marius Cornea 2018-04-24 03:16:07 UTC
Created attachment 1425777 [details]
ceph-ansible logs

Comment 12 Marius Cornea 2018-04-24 03:18:28 UTC
The error is reproducible when running the ceph upgrade deploy command(with environments/updates/update-from-ceph-newton.yaml) twice. I attached the logs in /var/log/mistral and /tmp/ansible-mistral*

Comment 16 John Fulton 2018-04-24 15:38:24 UTC
Created attachment 1426133 [details]
Inventory from run ansible-mistral-actionMI1OZq

Comment 19 John Fulton 2018-04-26 14:35:11 UTC
Eureka! 

The following task from openstack_config.yml has a bug:

https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/tasks/openstack_config.yml#L17-L22

because it assigns rbd to every pool, however the metrics pool is not of application type rbd, it should be of application type openstack_gnocchi: 

[root@controller-0 ceph]# ceph osd dump | grep "^pool"
pool 0 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 86 flags hashpspool stripe_width 0 application rbd
pool 1 'metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 86 flags hashpspool stripe_width 0 application openstack_gnocchi
pool 2 'images' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 91 flags hashpspool stripe_width 0 application rbd
pool 3 'backups' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 92 flags hashpspool stripe_width 0 application rbd
pool 4 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 94 flags hashpspool stripe_width 0 application rbd
pool 5 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 103 flags hashpspool stripe_width 0 application rbd
[root@controller-0 ceph]#

Comment 20 John Fulton 2018-04-26 14:50:00 UTC
(In reply to leseb from comment #8)
> The error is pretty slef-explanatory, if you get it it's because somehow the
> assigned pool changed. If the command has the same output all the time as
> shown in https://bugzilla.redhat.com/show_bug.cgi?id=1562220#c4 there is no
> problem. Once you try to assign a different application type that this is
> different from the current one the error occurs.

I agree. The pool which failed is metrics and it was correctly assigned the type openstack_gnocchi, but when  the task "assign rbd application to pool(s)" ran it tried to assign it rbd and we hit this error. 

https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/tasks/openstack_config.yml#L17-L22

> We need to either, close this BZ and open a new one with the right
> description/title or update this one's title/description.
> But as is both description and title are misleading and wrong.
> 
> So please let's do one of the two.

Sure. I have updated the title. 

One way I can think of to address this is to change the "when:" to check not only that the version is luminous but that the pool doesn't already have an application type. The output of 'ceph osd dump | grep "^pool"' could be used to do this. 

Does that make sense?

Comment 21 leseb 2018-04-26 15:33:42 UTC
More info on this, Ceph sets the application pool, this was introduced in: https://github.com/ceph/ceph/commit/6d7ac66ae2e0cfd326cfbecb6c2c9ff953b74600 but removed in https://github.com/ceph/ceph/commit/afda7157aefc1d90368523e27efcbdd53d879a7e

So basically Mimic won't have this anymore, this means newer cluster won't have it.

To solve this nicely, we need to add a new k/v to the dict openstack_pools so we can set an application.
So this code can run nicely without any issue: https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/tasks/openstack_config.yml#L17-L22

I'll work on a patch for this.

Comment 23 Giulio Fidente 2018-05-03 08:54:00 UTC
*** Bug 1574354 has been marked as a duplicate of this bug. ***

Comment 34 errata-xmlrpc 2018-09-26 18:19:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819