Bug 1574995
Summary: | [UPGRADES] Error during ceph upgrade: Error EINVAL: bad entity name | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Yurii Prokulevych <yprokule> | ||||
Component: | Ceph-Ansible | Assignee: | Guillaume Abrioux <gabrioux> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Vasishta <vashastr> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 3.1 | CC: | adeza, aschoen, augol, ccamacho, ceph-eng-bugs, gabrioux, gfidente, gmeno, johfulto, jstransk, kdreyer, nthomas, sankarshan, scohen, shan, yprokule, yrabl | ||||
Target Milestone: | rc | ||||||
Target Release: | 3.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHEL: ceph-ansible-3.1.0-0.1.rc3.el7cp | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1578282 (view as bug list) | Environment: | |||||
Last Closed: | 2019-08-27 05:11:55 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1548353 | ||||||
Attachments: |
|
Description
Yurii Prokulevych
2018-05-04 13:33:58 UTC
Created attachment 1432569 [details]
osp12.inventory.yaml
Attaching the inventory used for the initial OSP12 deployment. Cmdline was:
ansible-playbook -vv /usr/share/ceph-ansible/site-docker.yml.sample --user tripleo-admin --become --become-user root --extra-vars {"ireallymeanit": "yes"} --inventory-file /tmp/ansible-mistral-actioncHuWZ0/inventory.yaml --private-key /tmp/ansible-mistral-actioncHuWZ0/ssh_private_key --skip-tags package-install,with_pkg
Why is the following failing to create the MGR keys? https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/tasks/docker/main.yml#L97 2018-05-04 08:00:09,002 p=15296 u=mistral | TASK [ceph-mon : create ceph mgr keyring(s) when mon is containerized] ********* 2018-05-04 08:00:09,002 p=15296 u=mistral | task path: /usr/share/ceph-ansible/roles/ceph-mon/tasks/docker/main.yml:97 2018-05-04 08:00:09,002 p=15296 u=mistral | Friday 04 May 2018 08:00:09 -0400 (0:00:00.040) 0:04:13.446 ************ 2018-05-04 08:00:09,063 p=15296 u=mistral | [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: {{ groups.get(mgr_group_name, []) | length > 0 }} 2018-05-04 08:00:09,666 p=15296 u=mistral | failed: [192.168.24.6] (item=192.168.24.9) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-2", "ceph", "--cluster", "ceph", "auth", "get-or-create", "mgr.controller-0", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *", "-o", "/etc/ceph/ceph.mgr.controller-0.keyring"], "delta": "0:00:00.313477", "end": "2018-05-04 12:00:09.763679", "item": "192.168.24.9", "msg": "non-zero return code", "rc": 22, "start": "2018-05-04 12:00:09.450202", "stderr": "Error EINVAL: bad entity name", "stderr_lines": ["Error EINVAL: bad entity name"], "stdout": "", "stdout_lines": []} 2018-05-04 08:00:10,273 p=15296 u=mistral | failed: [192.168.24.6] (item=192.168.24.14) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-2", "ceph", "--cluster", "ceph", "auth", "get-or-create", "mgr.controller-1", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *", "-o", "/etc/ceph/ceph.mgr.controller-1.keyring"], "delta": "0:00:00.299650", "end": "2018-05-04 12:00:10.370583", "item": "192.168.24.14", "msg": "non-zero return code", "rc": 22, "start": "2018-05-04 12:00:10.070933", "stderr": "Error EINVAL: bad entity name", "stderr_lines": ["Error EINVAL: bad entity name"], "stdout": "", "stdout_lines": []} 2018-05-04 08:00:11,087 p=15296 u=mistral | failed: [192.168.24.6] (item=192.168.24.6) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-2", "ceph", "--cluster", "ceph", "auth", "get-or-create", "mgr.controller-2", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *", "-o", "/etc/ceph/ceph.mgr.controller-2.keyring"], "delta": "0:00:00.356980", "end": "2018-05-04 12:00:11.185707", "item": "192.168.24.6", "msg": "non-zero return code", "rc": 22, "start": "2018-05-04 12:00:10.828727", "stderr": "Error EINVAL: bad entity name", "stderr_lines": ["Error EINVAL: bad entity name"], "stdout": "", "stdout_lines": []} 2018-05-04 08:00:11,090 p=15296 u=mistral | RUNNING HANDLER [ceph-defaults : set _mon_handler_called before restart] ******* controller0 (192.168.24.6) was running the jewel container while controller{1,2} were running the luminous container [1]. When you run the command that produces the mgr key against a jewel container, you get the error EINVAL: bad entity name [2] which caused the upgrade to fail. This task should not have been run on controller0 until, as a result of the upgrade, it was running a luminous container. footnotes: [1] [fultonj@skagra bz1574995]$ grep ceph control*-sos/sosreport-ceph-upgrade-fail-controller-*/sos_commands/docker/docker_ps control0-sos/sosreport-ceph-upgrade-fail-controller-0-20180507054355/sos_commands/docker/docker_ps:b39cc32df6b4 192.168.24.1:8787/rhceph:2.5-3 "/entrypoint.sh" 2 days ago Up 2 days ceph-mon-controller-0 control1-sos/sosreport-ceph-upgrade-fail-controller-1-20180507055109/sos_commands/docker/docker_ps:a5e2531f3589 192.168.24.1:8787/rhceph:3-6 "/entrypoint.sh" 2 days ago Up 2 days ceph-mon-controller-1 control2-sos/sosreport-ceph-upgrade-fail-controller-2-20180507055837/sos_commands/docker/docker_ps:14272b012166 192.168.24.1:8787/rhceph:3-6 "/entrypoint.sh" 2 days ago Up 2 days ceph-mon-controller-2 [fultonj@skagra bz1574995]$ [2] [root@controller-0 ~]# docker ps | grep ceph a8bac73cc1b9 192.168.24.1:8787/rhceph:2.5-3 "/entrypoint.sh" 31 hours ago Up 31 hours ceph-mon-controller-0 [root@controller-0 ~]# docker exec ceph-mon-controller-0 ceph --cluster ceph auth get-or-create mgr.controller-0 mon allow profile mgr osd allow * mds allow * -o /etc/ceph/ceph.mgr.controller-0.keyring Error EINVAL: bad entity name [root@controller-0 ~]# We tried to reproduce this bug on a new env with Yurii, the update has completed successfully. the ceph-ansible version was still the same though. I've noticed mgr containers were well created, the mgr keyrings as well and then the upgrade completed. It seems some patches have been applied manually before the upgrade but not in ceph-ansible itself, Yurii could you give an update in this point? By the way, I've tried several times to reproduce this issue on another env without OSP layer by using the same versions of ceph-ansible (from v3.0.27 with jewel containers images to v3.1.0beta8 with luminous containers images), the upgrade worked fine for every attempt. I've analyzed the rolling_update.yml playbook log on an env that got the failure. The workflow is the following: 1/ it pulls new image here : https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/rolling_update.yml#L120 in ceph-docker-common 2/ still in ceph-docker-common, there are tasks that check if a new image has been pulled, if yes it notifies handlers. They are triggered after all roles have finished. (Handlers are in ceph-defaults roles) 3/ it keeps going and plays ceph-config and then ceph-mon : https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/rolling_update.yml#L121 and https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/rolling_update.yml#L122 4/ all roles have been played, so its time to run handlers, it seems that the container are restarted there then. monitors are deployed using serial: 1, it means all of this is done one node at a time. When running on last mon, since 'ceph-mon' role is played before the handlers are triggered it still running jewel when the task for mgr keyring creation is called. patch has been merged upstream, it will be in v3.1.0rc2 https://github.com/ceph/ceph-ansible/releases/tag/v3.1.0rc2 Yes, running Ansible a second time will fix the issue. Please list the content of /tmp/file-mistral-actionhv oeqB/7beb822a-575a-11e8-9b05-525400e6c600//etc/ Thanks There is no such directory on uc: [root@undercloud-0 (undercloud-12-US)~]# ll /tmp/file-mistral-actionhvoeqB/7beb822a-575a-11e8-9b05-525400e6c600/ ls: cannot access /tmp/file-mistral-actionhvoeqB/7beb822a-575a-11e8-9b05-525400e6c600/: No such file or directory the issue reported on c16 is because of a mismatch between the path where the mgr keys are fetched in rolling_update and the path where they are copied in ceph-mon. upstream patch: https://github.com/ceph/ceph-ansible/pull/2588/commits/d65bb8f9655d906054e634db67b02abcbb3ea837 will be in v3.1.0rc4 Yogev, Would you please set qa_ack ? New tag with the fixes https://github.com/ceph/ceph-ansible/releases/tag/v3.1.0rc3 Verified |