Description of problem: after successfully deploying ceph cluster, customer moved to the overcloud deployment, including ceph's tuning steps like creating additional crush rules and pools, facing an issue trying to set the (new) default crush rule. Version-Release number of selected component (if applicable): RHOSP 17 + RHCS 5 openstack-tripleo-heat-templates.noarch 14.3.1-0.20221208160327.feca772.el9ost @openstack-17-for-rhel-9-x86_64-rpms tripleo-ansible.noarch 3.3.1-0.20221208161844.fa5422f.el9ost @openstack-17-for-rhel-9-x86_64-rpms cephadm.noarch 2:16.2.10-94.el9cp @rhos-17.0-RHCS-5 How reproducible: tune ceph and run deployment Steps to Reproduce: 1) Tune ceph-config.yaml in order create a couple of new crush rules setting one of them as default: parameter_defaults: CephCrushRules: - name: HDD root: default type: host class: hdd default: true - name: SSD root: default type: host class: ssd default: false CinderRbdExtraPools: ssdpool CephPools: - name: ssdpool rule_name: SSD application: rbd - name: volumes target_size_ratio: 0.4 application: rbd - name: images target_size_ratio: 0.1 application: rbd - name: vms target_size_ratio: 0.3 application: rbd 2) run overcloud deploy Actual results: Deployment is failing as task is trying to use a non-existing socket (/var/run/ceph/ceph-mon.overcloud-controller-0.mydomain.com.asok)... 2023-08-02 21:25:37,195 p=150594 u=stack n=ansible | 2023-08-02 21:25:37.193048 | {uuid} | FATAL | insert new default crush rule into daemon to prevent restart | overcloud-controller-0 -> {ip} | item=overcloud-controller-0 | error={"ansible_loop_var": "item", "changed": false, "cmd": ["podman", "run", "--rm", "--net=host", "--ipc=host", "--volume", "/etc/ceph:/etc/ceph:z", "--volume", "/home/ceph-admin/assimilate_ceph.conf:/home/assimilate_ceph.conf:z", "--volume", "/var/run/ceph/{fsid}:/var/run/ceph:z", "--entrypoint", "ceph", "director17.ctlplane.mydomain.com:8787/rhceph/rhceph-5-rhel8:latest", "--admin-daemon", "/var/run/ceph/ceph-mon.overcloud-controller-0.mydomain.com.asok", "config", "set", "osd_pool_default_crush_rule", "1"], "delta": "0:00:00.456442", "end": "2023-08-02 21:25:37.152697", "item": "overcloud-controller-0", "msg": "non-zero return code", "rc": 22, "start": "2023-08-02 21:25:36.696255", "stderr": "admin_socket: exception getting command descriptions: [Errno 2] No such file or directory", "stderr_lines": ["admin_socket: exception getting command descriptions: [Errno 2] No such file or directory"], "stdout": "", "stdout_lines": []} ...while the admin socket file is: /var/run/ceph/ceph-mon.overcloud-controller-0.asok [root@overcloud-controller-0 {fsid}]# ls -las total 0 0 drwxrwx---. 2 167 167 80 Aug 1 23:11 . 0 drwxr-xr-x. 3 root root 60 Aug 1 23:10 .. 0 srwxr-xr-x. 1 167 167 0 Aug 1 23:11 ceph-mgr.overcloud-controller-0.schoqv.asok 0 srwxr-xr-x. 1 167 167 0 Aug 1 23:10 ceph-mon.overcloud-controller-0.asok Additional info: cat etc/hosts |grep controller-0 {ip} overcloud-controller-0.mydomain.com overcloud-controller-0 {ip} overcloud-controller-0.storage.mydomain.com overcloud-controller-0.storage {ip} overcloud-controller-0.storagemgmt.mydomain.com overcloud-controller-0.storagemgmt {ip} overcloud-controller-0.internalapi.mydomain.com overcloud-controller-0.internalapi {ip} overcloud-controller-0.tenant.mydomain.com overcloud-controller-0.tenant {ip} overcloud-controller-0.external.mydomain.com overcloud-controller-0.external {ip} overcloud-controller-0.ctlplane.mydomain.com overcloud-controller-0.ctlplane
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days