Bug 2228783 - [OSP 17] overcloud deployment is failing trying to use a non-existing ceph admin socket
Summary: [OSP 17] overcloud deployment is failing trying to use a non-existing ceph ad...
Keywords:
Status: CLOSED DUPLICATE of bug 2210873
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: ceph-ansible
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Teoman ONAY
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-03 09:04 UTC by Flavio Piccioni
Modified: 2023-12-03 04:25 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-04 10:58:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-27163 0 None None None 2023-08-03 09:06:07 UTC

Description Flavio Piccioni 2023-08-03 09:04:06 UTC
Description of problem:
after successfully deploying ceph cluster, customer moved to the overcloud deployment, including ceph's tuning steps like creating additional crush rules and pools, facing an issue trying to set the (new) default crush rule.


Version-Release number of selected component (if applicable):
RHOSP 17 + RHCS 5

openstack-tripleo-heat-templates.noarch          14.3.1-0.20221208160327.feca772.el9ost   @openstack-17-for-rhel-9-x86_64-rpms 
tripleo-ansible.noarch                           3.3.1-0.20221208161844.fa5422f.el9ost    @openstack-17-for-rhel-9-x86_64-rpms 
cephadm.noarch                                   2:16.2.10-94.el9cp                       @rhos-17.0-RHCS-5              


How reproducible:
tune ceph and run deployment


Steps to Reproduce:
1) Tune ceph-config.yaml in order create a couple of new crush rules setting one of them as default:

parameter_defaults:
  CephCrushRules:
    - name: HDD
      root: default
      type: host
      class: hdd
      default: true
    - name: SSD
      root: default
      type: host
      class: ssd
      default: false
  CinderRbdExtraPools: ssdpool 
  CephPools:
    - name: ssdpool
      rule_name: SSD
      application: rbd
    - name: volumes
      target_size_ratio: 0.4
      application: rbd
    - name: images
      target_size_ratio: 0.1
      application: rbd
    - name: vms
      target_size_ratio: 0.3
      application: rbd


2) run overcloud deploy


Actual results:
Deployment is failing as task is trying to use a non-existing socket (/var/run/ceph/ceph-mon.overcloud-controller-0.mydomain.com.asok)...

2023-08-02 21:25:37,195 p=150594 u=stack n=ansible | 2023-08-02 21:25:37.193048 | {uuid} |      FATAL | insert new default crush rule into daemon to prevent restart | overcloud-controller-0 -> {ip} | item=overcloud-controller-0 | error={"ansible_loop_var": "item", "changed": false, "cmd": ["podman", "run", "--rm", "--net=host", "--ipc=host", "--volume", "/etc/ceph:/etc/ceph:z", "--volume", "/home/ceph-admin/assimilate_ceph.conf:/home/assimilate_ceph.conf:z", "--volume", "/var/run/ceph/{fsid}:/var/run/ceph:z", "--entrypoint", "ceph", "director17.ctlplane.mydomain.com:8787/rhceph/rhceph-5-rhel8:latest", "--admin-daemon", "/var/run/ceph/ceph-mon.overcloud-controller-0.mydomain.com.asok", "config", "set", "osd_pool_default_crush_rule", "1"], "delta": "0:00:00.456442", "end": "2023-08-02 21:25:37.152697", "item": "overcloud-controller-0", "msg": "non-zero return code", "rc": 22, "start": "2023-08-02 21:25:36.696255", "stderr": "admin_socket: exception getting command descriptions: [Errno 2] No such file or directory", "stderr_lines": ["admin_socket: exception getting command descriptions: [Errno 2] No such file or directory"], "stdout": "", "stdout_lines": []}


...while the admin socket file is: /var/run/ceph/ceph-mon.overcloud-controller-0.asok


[root@overcloud-controller-0 {fsid}]# ls -las
total 0
0 drwxrwx---. 2  167  167 80 Aug  1 23:11 .
0 drwxr-xr-x. 3 root root 60 Aug  1 23:10 ..
0 srwxr-xr-x. 1  167  167  0 Aug  1 23:11 ceph-mgr.overcloud-controller-0.schoqv.asok
0 srwxr-xr-x. 1  167  167  0 Aug  1 23:10 ceph-mon.overcloud-controller-0.asok


Additional info:

cat etc/hosts |grep controller-0
{ip} overcloud-controller-0.mydomain.com overcloud-controller-0
{ip} overcloud-controller-0.storage.mydomain.com overcloud-controller-0.storage
{ip} overcloud-controller-0.storagemgmt.mydomain.com overcloud-controller-0.storagemgmt
{ip} overcloud-controller-0.internalapi.mydomain.com overcloud-controller-0.internalapi
{ip} overcloud-controller-0.tenant.mydomain.com overcloud-controller-0.tenant
{ip} overcloud-controller-0.external.mydomain.com overcloud-controller-0.external
{ip} overcloud-controller-0.ctlplane.mydomain.com overcloud-controller-0.ctlplane

Comment 6 Red Hat Bugzilla 2023-12-03 04:25:13 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.