Description of problem: DistributedComputeHCIScaleOut should only have CephOSD, and shouldn't have CephMon nor CephMgr. ~~~ https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/roles/DistributedComputeHCIScaleOut.yaml#L29-L31 - OS::TripleO::Services::CephClient - OS::TripleO::Services::CephExternal - OS::TripleO::Services::CephOSD ~~~ However, CephMon and CephMgr are deployed on DistributedComputeHCIScaleOut in my RHOSP 17.0.1 lab. ~~~ [root@dcn0-compute-0 ~]# podman ps |grep ceph fa4985f495f8 undercloud.ctlplane.yatanaka.example.com:8787/rhceph/rhceph-5-rhel8@sha256:b25f6178c91483c5248f9794122f1f6731e42cbc8ddba8402c7a9e2911e0e874 -n client.crash.d... 3 hours ago Up 3 hours ago ceph-d961401d-50e0-50ac-a40f-ef07cbc752a6-crash-dcn0-compute-0 30bcde9123aa undercloud.ctlplane.yatanaka.example.com:8787/openshift4/ose-prometheus-node-exporter:v4.6 --no-collector.ti... 3 hours ago Up 3 hours ago ceph-d961401d-50e0-50ac-a40f-ef07cbc752a6-node-exporter-dcn0-compute-0 9be8e742b6b4 undercloud.ctlplane.yatanaka.example.com:8787/rhceph/rhceph-5-rhel8@sha256:b25f6178c91483c5248f9794122f1f6731e42cbc8ddba8402c7a9e2911e0e874 -n mon.dcn0-compu... 3 hours ago Up 3 hours ago ceph-d961401d-50e0-50ac-a40f-ef07cbc752a6-mon-dcn0-compute-0 28b3bd3654dc undercloud.ctlplane.yatanaka.example.com:8787/rhceph/rhceph-5-rhel8@sha256:b25f6178c91483c5248f9794122f1f6731e42cbc8ddba8402c7a9e2911e0e874 -n mgr.dcn0-compu... 3 hours ago Up 3 hours ago ceph-d961401d-50e0-50ac-a40f-ef07cbc752a6-mgr-dcn0-compute-0-efkzri (undercloud) [stack@undercloud ~]$ ssh heat-admin sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring /etc/ceph/dcn0.client.admin.keyring -- ceph orch ps Inferring fsid d961401d-50e0-50ac-a40f-ef07cbc752a6 crash.dcn0-compute-0 dcn0-compute-0 running (2h) 3m ago 2d 6627k - 16.2.10-187.el8cp 72d512a15e58 fa4985f495f8 <=========(*)DistributedComputeHCIScaleOut crash.dcn0-computehci-0 dcn0-computehci-0 running (2h) 3m ago 2d 6627k - 16.2.10-187.el8cp 72d512a15e58 97d594b3eb84 crash.dcn0-computehci-1 dcn0-computehci-1 running (2h) 3m ago 2d 6627k - 16.2.10-187.el8cp 72d512a15e58 22e6d923190d crash.dcn0-computehci-2 dcn0-computehci-2 running (2h) 3m ago 2d 6627k - 16.2.10-187.el8cp 72d512a15e58 a499d142239f mgr.dcn0-compute-0.efkzri dcn0-compute-0 running (2h) 3m ago 2d 395M - 16.2.10-187.el8cp 72d512a15e58 28b3bd3654dc <=========(*)DistributedComputeHCIScaleOut mgr.dcn0-computehci-0.nlakit dcn0-computehci-0 *:9283 running (2h) 3m ago 2d 470M - 16.2.10-187.el8cp 72d512a15e58 ea200fa686bb mgr.dcn0-computehci-1.rioxdc dcn0-computehci-1 running (2h) 3m ago 2d 403M - 16.2.10-187.el8cp 72d512a15e58 f3a9b33ebb6c mgr.dcn0-computehci-2.xuhuhm dcn0-computehci-2 running (2h) 3m ago 2d 403M - 16.2.10-187.el8cp 72d512a15e58 e74abfdf8bde mon.dcn0-compute-0 dcn0-compute-0 running (2h) 3m ago 2d 128M 2048M 16.2.10-187.el8cp 72d512a15e58 9be8e742b6b4 <=========(*)DistributedComputeHCIScaleOut mon.dcn0-computehci-0 dcn0-computehci-0 running (2h) 3m ago 2d 273M 2048M 16.2.10-187.el8cp 72d512a15e58 25cedb06cdfd mon.dcn0-computehci-1 dcn0-computehci-1 running (2h) 3m ago 2d 229M 2048M 16.2.10-187.el8cp 72d512a15e58 18ae06f39ea1 mon.dcn0-computehci-2 dcn0-computehci-2 running (2h) 3m ago 2d 187M 2048M 16.2.10-187.el8cp 72d512a15e58 5f86d45a7f0a node-exporter.dcn0-compute-0 dcn0-compute-0 172.16.1.100:9100 running (2h) 3m ago 2d 22.1M - 1.0.1 c8af8d642c9a 30bcde9123aa <=========(*)DistributedComputeHCIScaleOut node-exporter.dcn0-computehci-0 dcn0-computehci-0 172.16.1.34:9100 running (2h) 3m ago 2d 23.0M - 1.0.1 c8af8d642c9a e7606e183ced node-exporter.dcn0-computehci-1 dcn0-computehci-1 172.16.1.104:9100 running (2h) 3m ago 2d 23.5M - 1.0.1 c8af8d642c9a 1bde9c8b9b9b node-exporter.dcn0-computehci-2 dcn0-computehci-2 172.16.1.96:9100 running (2h) 3m ago 2d 21.6M - 1.0.1 c8af8d642c9a a2a77f39a9c1 ~~~ When I check the spec file for cephadm generated by TripleO, I can see that Mon/Mgr/Osd are scheduled on DistributedComputeHCIScaleOut node as well as DistributedComputeHCI ~~~ (undercloud) [stack@undercloud ~]$ cat overcloud-deploy/dcn0/generated_ceph_spec.yaml : placement: hosts: - dcn0-computehci-0 <==============(*) DistributedComputeHCI - dcn0-computehci-1 <==============(*) DistributedComputeHCI - dcn0-computehci-2 <==============(*) DistributedComputeHCI - dcn0-compute-0 <==============(*) DistributedComputeHCIScaleOut service_id: mon service_name: mon service_type: mon --- placement: hosts: - dcn0-computehci-0 <==============(*) DistributedComputeHCI - dcn0-computehci-1 <==============(*) DistributedComputeHCI - dcn0-computehci-2 <==============(*) DistributedComputeHCI - dcn0-compute-0 <==============(*) DistributedComputeHCIScaleOut service_id: mgr service_name: mgr service_type: mgr --- data_devices: all: true placement: hosts: - dcn0-computehci-0 <==============(*) DistributedComputeHCI - dcn0-computehci-1 <==============(*) DistributedComputeHCI - dcn0-computehci-2 <==============(*) DistributedComputeHCI - dcn0-compute-0 <==============(*) DistributedComputeHCIScaleOut service_id: default_drive_group service_name: osd.default_drive_group service_type: osd ~~~ This bug comes from the following code. The regular expression is not proper. ~~~ https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/ansible_plugins/modules/ceph_spec_bootstrap.py#L280 pat = host_fmt.replace('%stackname%', '.*').replace('-%index%', '') reg = re.compile(pat) matching_hosts = [] for host in name_map: if reg.match(host): matching_hosts.append(name_map[host]) ~~~ I did rpdb debug here, and the following is the result of the rpdb debug. The regex for DistributedComputeHCI role is '.*-distributedcomputehci'. Ideally, this should only matches DistributedComputeHCI nodes. However, it matches both DistributedComputeHCI nodes and DistributedComputeHCIScaleOut nodes. ~~~ (Pdb) p reg re.compile('.*-distributedcomputehci') <=============(*) wrong regex (Pdb) print(json.dumps(name_map, indent=2)) { "dcn0-distributedcomputehci-0": "dcn0-computehci-0", "dcn0-distributedcomputehci-1": "dcn0-computehci-1", "dcn0-distributedcomputehci-2": "dcn0-computehci-2", "dcn0-distributedcomputehciscaleout-0": "dcn0-compute-0" <=============(*) This regex also matches DistributedComputeHCIScaleOut node wrongly. } ~~~ That's why Mon/Mgr are deployed on DistributedComputeHCIScaleOut as well as DistributedComputeHCI I think this regex should be like '.*-distributedcomputehci-', not '.*-distributedcomputehci' Version-Release number of selected component (if applicable): RHOSP 17.0.1 How reproducible: Steps to Reproduce: 1. Run `openstack overcloud ceph deploy` for a DCN site with 3 DistributedComputeHCI nodes and at least 1 DistributedComputeHCIScaleOut. Actual results: DistributedComputeHCIScaleOut has MGR/MON as well as OSD Expected results: DistributedComputeHCIScaleOut only has OSD, not MGR/MON
JFYI, I'm pasting roles_data.yaml and overcloud-baremetal-deploy.yaml used to deploy ceph below: ~~~ (undercloud) [stack@undercloud ~]$ cat dcn0/dcn0_roles.yaml ############################################################################### # File generated by TripleO ############################################################################### ############################################################################### # Role: DistributedComputeHCI # ############################################################################### - name: DistributedComputeHCI description: | Distributed Compute Node role with Ceph, Cinder volume, and Glance. tags: - compute networks: InternalApi: subnet: internal_api_subnet Tenant: subnet: tenant_subnet Storage: subnet: storage_subnet StorageMgmt: subnet: storage_mgmt_subnet RoleParametersDefault: FsAioMaxNumber: 1048576 TunedProfileName: "throughput-performance" # CephOSD present so serial has to be 1 update_serial: 1 ServicesDefault: - OS::TripleO::Services::Aide - OS::TripleO::Services::AuditD - OS::TripleO::Services::BarbicanClient - OS::TripleO::Services::BootParams - OS::TripleO::Services::CACerts - OS::TripleO::Services::CephClient - OS::TripleO::Services::CephExternal - OS::TripleO::Services::CephGrafana - OS::TripleO::Services::CephMds - OS::TripleO::Services::CephMgr - OS::TripleO::Services::CephMon - OS::TripleO::Services::CephRbdMirror - OS::TripleO::Services::CephRgw - OS::TripleO::Services::CephOSD - OS::TripleO::Services::CinderVolumeEdge - OS::TripleO::Services::Collectd - OS::TripleO::Services::ComputeCeilometerAgent - OS::TripleO::Services::ComputeNeutronCorePlugin - OS::TripleO::Services::ComputeNeutronL3Agent - OS::TripleO::Services::ComputeNeutronMetadataAgent - OS::TripleO::Services::ComputeNeutronOvsAgent - OS::TripleO::Services::Etcd - OS::TripleO::Services::Frr - OS::TripleO::Services::GlanceApiEdge - OS::TripleO::Services::IpaClient - OS::TripleO::Services::Ipsec - OS::TripleO::Services::Iscsid - OS::TripleO::Services::Kernel - OS::TripleO::Services::LoginDefs - OS::TripleO::Services::MetricsQdr - OS::TripleO::Services::Multipathd - OS::TripleO::Services::MySQLClient - OS::TripleO::Services::NeutronBgpVpnBagpipe - OS::TripleO::Services::NeutronLinuxbridgeAgent - OS::TripleO::Services::NeutronVppAgent - OS::TripleO::Services::NovaAZConfig - OS::TripleO::Services::NovaCompute - OS::TripleO::Services::NovaLibvirt - OS::TripleO::Services::NovaLibvirtGuests - OS::TripleO::Services::NovaMigrationTarget - OS::TripleO::Services::ContainersLogrotateCrond - OS::TripleO::Services::Podman - OS::TripleO::Services::Rhsm - OS::TripleO::Services::Rsyslog - OS::TripleO::Services::RsyslogSidecar - OS::TripleO::Services::Securetty - OS::TripleO::Services::Snmp - OS::TripleO::Services::Sshd - OS::TripleO::Services::Timesync - OS::TripleO::Services::Timezone - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::Tuned - OS::TripleO::Services::Vpp - OS::TripleO::Services::OVNController - OS::TripleO::Services::OVNMetadataAgent ############################################################################### # Role: DistributedComputeHCIScaleOut # ############################################################################### - name: DistributedComputeHCIScaleOut description: | Distributed Compute Node role with CephOSD and HAproxy for Glance. tags: - compute networks: InternalApi: subnet: internal_api_subnet Tenant: subnet: tenant_subnet Storage: subnet: storage_subnet StorageMgmt: subnet: storage_mgmt_subnet RoleParametersDefault: FsAioMaxNumber: 1048576 TunedProfileName: "throughput-performance" # CephOSD present so serial has to be 1 update_serial: 1 ServicesDefault: - OS::TripleO::Services::Aide - OS::TripleO::Services::AuditD - OS::TripleO::Services::BarbicanClient - OS::TripleO::Services::BootParams - OS::TripleO::Services::CACerts - OS::TripleO::Services::CephClient - OS::TripleO::Services::CephExternal - OS::TripleO::Services::CephOSD - OS::TripleO::Services::Collectd - OS::TripleO::Services::ComputeCeilometerAgent - OS::TripleO::Services::ComputeNeutronCorePlugin - OS::TripleO::Services::ComputeNeutronL3Agent - OS::TripleO::Services::ComputeNeutronMetadataAgent - OS::TripleO::Services::ComputeNeutronOvsAgent - OS::TripleO::Services::Frr - OS::TripleO::Services::HAproxyEdge - OS::TripleO::Services::IpaClient - OS::TripleO::Services::Ipsec - OS::TripleO::Services::Iscsid - OS::TripleO::Services::Kernel - OS::TripleO::Services::LoginDefs - OS::TripleO::Services::MetricsQdr - OS::TripleO::Services::Multipathd - OS::TripleO::Services::MySQLClient - OS::TripleO::Services::NeutronBgpVpnBagpipe - OS::TripleO::Services::NeutronLinuxbridgeAgent - OS::TripleO::Services::NeutronVppAgent - OS::TripleO::Services::NovaAZConfig - OS::TripleO::Services::NovaCompute - OS::TripleO::Services::NovaLibvirt - OS::TripleO::Services::NovaLibvirtGuests - OS::TripleO::Services::NovaMigrationTarget - OS::TripleO::Services::ContainersLogrotateCrond - OS::TripleO::Services::Podman - OS::TripleO::Services::Rhsm - OS::TripleO::Services::Rsyslog - OS::TripleO::Services::RsyslogSidecar - OS::TripleO::Services::Securetty - OS::TripleO::Services::Snmp - OS::TripleO::Services::Sshd - OS::TripleO::Services::Timesync - OS::TripleO::Services::Timezone - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::Tuned - OS::TripleO::Services::Vpp - OS::TripleO::Services::OVNController - OS::TripleO::Services::OVNMetadataAgent (undercloud) [stack@undercloud ~]$ cat dcn0/overcloud-baremetal-deploy.yaml - name: DistributedComputeHCI count: 3 defaults: networks: - network: ctlplane vif: true - network: external subnet: external_subnet - network: internal_api subnet: internal_api_subnet - network: storage subnet: storage_subnet - network: storage_mgmt subnet: storage_mgmt_subnet - network: tenant subnet: tenant_subnet network_config: template: /home/stack/dcn0/two_interfaces.j2 default_route_network: - external instances: - hostname: dcn0-computehci-0 name: dcn0_computehci0 - hostname: dcn0-computehci-1 name: dcn0_computehci1 - hostname: dcn0-computehci-2 name: dcn0_computehci2 - name: DistributedComputeHCIScaleOut count: 1 defaults: networks: - network: ctlplane vif: true - network: external subnet: external_subnet - network: internal_api subnet: internal_api_subnet - network: storage subnet: storage_subnet - network: storage_mgmt subnet: storage_mgmt_subnet - network: tenant subnet: tenant_subnet network_config: template: /home/stack/dcn0/two_interfaces.j2 default_route_network: - external instances: - hostname: dcn0-compute-0 name: dcn0_compute0 ~~~
How to test: On a DCN site with 3 DistributedComputeHCI nodes and at least 1 DistributedComputeHCIScaleOut node. Run ceph deploy using the command openstack 'overcloud ceph deploy' The DistributedComputeHCIScaleOut node should not have MON/MGR service but only OSD service. Also, any node configured for a role should have only the these services listed for that role as in [1] [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/roles/