Bug 2228783
| Summary: | [OSP 17] overcloud deployment is failing trying to use a non-existing ceph admin socket | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Flavio Piccioni <fpiccion> |
| Component: | ceph-ansible | Assignee: | Teoman ONAY <tonay> |
| Status: | CLOSED DUPLICATE | QA Contact: | Yogev Rabl <yrabl> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 17.0 (Wallaby) | CC: | fpantano, gfidente, tonay |
| Target Milestone: | --- | Flags: | ifrangs:
needinfo?
(tonay) |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-08-04 10:58:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: after successfully deploying ceph cluster, customer moved to the overcloud deployment, including ceph's tuning steps like creating additional crush rules and pools, facing an issue trying to set the (new) default crush rule. Version-Release number of selected component (if applicable): RHOSP 17 + RHCS 5 openstack-tripleo-heat-templates.noarch 14.3.1-0.20221208160327.feca772.el9ost @openstack-17-for-rhel-9-x86_64-rpms tripleo-ansible.noarch 3.3.1-0.20221208161844.fa5422f.el9ost @openstack-17-for-rhel-9-x86_64-rpms cephadm.noarch 2:16.2.10-94.el9cp @rhos-17.0-RHCS-5 How reproducible: tune ceph and run deployment Steps to Reproduce: 1) Tune ceph-config.yaml in order create a couple of new crush rules setting one of them as default: parameter_defaults: CephCrushRules: - name: HDD root: default type: host class: hdd default: true - name: SSD root: default type: host class: ssd default: false CinderRbdExtraPools: ssdpool CephPools: - name: ssdpool rule_name: SSD application: rbd - name: volumes target_size_ratio: 0.4 application: rbd - name: images target_size_ratio: 0.1 application: rbd - name: vms target_size_ratio: 0.3 application: rbd 2) run overcloud deploy Actual results: Deployment is failing as task is trying to use a non-existing socket (/var/run/ceph/ceph-mon.overcloud-controller-0.mydomain.com.asok)... 2023-08-02 21:25:37,195 p=150594 u=stack n=ansible | 2023-08-02 21:25:37.193048 | {uuid} | FATAL | insert new default crush rule into daemon to prevent restart | overcloud-controller-0 -> {ip} | item=overcloud-controller-0 | error={"ansible_loop_var": "item", "changed": false, "cmd": ["podman", "run", "--rm", "--net=host", "--ipc=host", "--volume", "/etc/ceph:/etc/ceph:z", "--volume", "/home/ceph-admin/assimilate_ceph.conf:/home/assimilate_ceph.conf:z", "--volume", "/var/run/ceph/{fsid}:/var/run/ceph:z", "--entrypoint", "ceph", "director17.ctlplane.mydomain.com:8787/rhceph/rhceph-5-rhel8:latest", "--admin-daemon", "/var/run/ceph/ceph-mon.overcloud-controller-0.mydomain.com.asok", "config", "set", "osd_pool_default_crush_rule", "1"], "delta": "0:00:00.456442", "end": "2023-08-02 21:25:37.152697", "item": "overcloud-controller-0", "msg": "non-zero return code", "rc": 22, "start": "2023-08-02 21:25:36.696255", "stderr": "admin_socket: exception getting command descriptions: [Errno 2] No such file or directory", "stderr_lines": ["admin_socket: exception getting command descriptions: [Errno 2] No such file or directory"], "stdout": "", "stdout_lines": []} ...while the admin socket file is: /var/run/ceph/ceph-mon.overcloud-controller-0.asok [root@overcloud-controller-0 {fsid}]# ls -las total 0 0 drwxrwx---. 2 167 167 80 Aug 1 23:11 . 0 drwxr-xr-x. 3 root root 60 Aug 1 23:10 .. 0 srwxr-xr-x. 1 167 167 0 Aug 1 23:11 ceph-mgr.overcloud-controller-0.schoqv.asok 0 srwxr-xr-x. 1 167 167 0 Aug 1 23:10 ceph-mon.overcloud-controller-0.asok Additional info: cat etc/hosts |grep controller-0 {ip} overcloud-controller-0.mydomain.com overcloud-controller-0 {ip} overcloud-controller-0.storage.mydomain.com overcloud-controller-0.storage {ip} overcloud-controller-0.storagemgmt.mydomain.com overcloud-controller-0.storagemgmt {ip} overcloud-controller-0.internalapi.mydomain.com overcloud-controller-0.internalapi {ip} overcloud-controller-0.tenant.mydomain.com overcloud-controller-0.tenant {ip} overcloud-controller-0.external.mydomain.com overcloud-controller-0.external {ip} overcloud-controller-0.ctlplane.mydomain.com overcloud-controller-0.ctlplane