On the FFU upgrade from osp16.2 to osp17.1 the openstack overcloud upgrade got stuck for more than 7 hours. From overcloud_upgrade_run-computehci-0,computehci-1,computehci-2,controller-0,controller-1,controller-2,database-0,database-1,database-2,messaging-0,messaging-1,messaging-2,networker-0,networker-1,undercloud.log: 2023-07-12 23:18:44 | 2023-07-12 23:18:44.799776 | 525400d7-420c-ee98-3fcf-00000001a8f7 | OK | Notify user about upcoming cephadm execution(s) | undercloud | result={^M 2023-07-12 23:18:44 | "changed": false,^M 2023-07-12 23:18:44 | "msg": "Running 1 cephadm playbook(s) (immediate log at /home/stack/overcloud-deploy/qe-Cloud-0/config-download/qe-Cloud-0/cephadm/cephadm_command.log)"^M 2023-07-12 23:18:44 | } 2023-07-12 23:18:44 | 2023-07-12 23:18:44.802034 | 525400d7-420c-ee98-3fcf-00000001a8f7 | TIMING | tripleo_run_cephadm : Notify user about upcoming cephadm execution(s) | undercloud | 0:29:31.578488 | 0.05s 2023-07-12 23:18:44 | 2023-07-12 23:18:44.818351 | 525400d7-420c-ee98-3fcf-00000001a8f8 | TASK | run cephadm playbook From /home/stack/overcloud-deploy/qe-Cloud-0/config-download/qe-Cloud-0/cephadm/cephadm_command.log: 2023-07-12 23:19:26,778 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.777611 | 525400d7-420c-9c3f-b529-0000000001aa | TASK | Get ceph_cli 2023-07-12 23:19:26,830 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.830146 | b6978dfe-0e19-493b-b2ec-c3f5c0f2d291 | INCLUDED | /usr/share/ansible/roles/tripleo_cephadm/tasks/ceph_cli.yaml | controller-0 2023-07-12 23:19:26,853 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.852367 | 525400d7-420c-9c3f-b529-00000000037a | TASK | Set ceph CLI 2023-07-12 23:19:26,914 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.913226 | 525400d7-420c-9c3f-b529-00000000037a | OK | Set ceph CLI | controller-0 2023-07-12 23:19:26,936 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.935348 | 525400d7-420c-9c3f-b529-0000000001ab | TASK | Get the ceph orchestrator status Executing the following commands did got any result and it got stuck: tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch status Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5 Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
CEPH STATUS =========== tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph status Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5 Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e cluster: id: 1606aa1c-08f8-4e53-9b34-3c74181f65d5 health: HEALTH_WARN mons are allowing insecure global_id reclaim services: mon: 3 daemons, quorum controller-2,controller-1,controller-0 (age 8h) mgr: controller-0(active, since 8h), standbys: controller-1, controller-2 osd: 15 osds: 15 up (since 8h), 15 in (since 6w) data: pools: 5 pools, 513 pgs objects: 584 objects, 1.8 GiB usage: 5.3 GiB used, 475 GiB / 480 GiB avail pgs: 513 active+clean ERROR LOGS =========== [tripleo-admin@controller-0 ~]$ sudo grep -i health -r /var/log/ceph/ [tripleo-admin@controller-0 ~]$ grep -i error -r /var/log/ceph/* grep: /var/log/ceph/1606aa1c-08f8-4e53-9b34-3c74181f65d5: Permission denied /var/log/ceph/cephadm.log:2023-07-12 22:24:34,883 7fe016b5eb80 DEBUG /bin/podman: Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0 /var/log/ceph/cephadm.log:2023-07-12 22:24:34,888 7fe016b5eb80 INFO /bin/podman: stderr Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0 /var/log/ceph/cephadm.log:2023-07-12 22:24:35,000 7fe016b5eb80 DEBUG /bin/podman: Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash.controller-0 /var/log/ceph/cephadm.log:2023-07-12 22:24:35,007 7fe016b5eb80 INFO /bin/podman: stderr Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash.controller-0 CONTAINERS =========== [tripleo-admin@controller-0 ~]$ sudo podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 824cbd62da39 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-horizon:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago horizon f3d03c58ed52 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago iscsid 65971615e059 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-keystone:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago keystone ce0f1e5ffa58 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_object_expirer 35282fd69a26 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_object_replicator 5c5cf10e91ff undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_object_server a2dddd6bccd6 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_object_updater 92915cbcf7fa undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_rsync c618e98e8a97 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago cinder_api f655a318253d undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago cinder_api_cron eefb74ccd83b undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-scheduler:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago cinder_scheduler e7dd5a227e0e undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago heat_api d4d492f3e803 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api-cfn:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago heat_api_cfn d66201008e35 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago heat_api_cron 3544c63637b4 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-engine:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago heat_engine 8ff76e38752a undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cron:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago logrotate_crond 9dbba7d4b2f2 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-server:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago neutron_api 0d84f6f18a2f undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago nova_conductor 5b6e51882fc1 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-scheduler:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago nova_scheduler 5f9258688351 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-novncproxy:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago nova_vnc_proxy f91655ecdba7 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_account_auditor 942156362e53 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_account_reaper 84fd9898003e undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_account_replicator 1bb99f2125f7 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_account_server 599c4d4fc6e7 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_container_auditor 2e2cc698178b undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_container_replicator 9422624a67e7 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_container_server 381786cee25d undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_container_updater 36ccd0726ebc undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_object_auditor 9568a44ca416 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-placement-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago placement_api 9b1c694cf8f0 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago glance_api 98bb550c241a undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago glance_api_cron f5462a1ba132 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago nova_api 50f422579832 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago nova_metadata a7c7854b5c03 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago swift_proxy 2f8ca8cc4d10 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1 kolla_start 6 weeks ago Up 10 hours ago nova_api_cron 4999273b993b undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404 -n mon.controller... 8 hours ago Up 8 hours ago ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-mon-controller-0 de6b573beadc undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404 -n mgr.controller... 8 hours ago Up 8 hours ago ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-mgr-controller-0 985cc3701391 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e -n client.crash.c... 8 hours ago Up 8 hours ago ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0 63f9724035c9 cluster.common.tag/haproxy:pcmklatest /bin/bash /usr/lo... 8 hours ago Up 8 hours ago haproxy-bundle-podman-0 002ad445b88b undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-memcached:17.1_20230711.2 kolla_start 7 hours ago Up 7 hours ago memcached bd20fa130fdc undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404 --fsid 1606aa1c-0... 7 hours ago Up 7 hours ago ecstatic_napier CEPH HEALTH =========== [tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph health detail Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5 Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e HEALTH_WARN mons are allowing insecure global_id reclaim [WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim mon.controller-2 has auth_allow_insecure_global_id_reclaim set to true mon.controller-1 has auth_allow_insecure_global_id_reclaim set to true mon.controller-0 has auth_allow_insecure_global_id_reclaim set to true SYSTEMCTL ========= [tripleo-admin@controller-0 ~]$ sudo systemctl status ceph\*.service ● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph mon.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5 Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2023-07-12 22:19:45 UTC; 8h ago Main PID: 353380 (conmon) Tasks: 30 (limit: 100744) Memory: 925.2M CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service ├─container │ ├─353392 /dev/init -- /usr/bin/ceph-mon -n mon.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true │ └─353405 /usr/bin/ceph-mon -n mon.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true └─supervisor └─353380 /usr/bin/conmon --api-version 1 -c 4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fdbfcc -u 4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fdbfcc -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fd> Jul 13 06:49:02 controller-0 conmon[353380]: cluster 2023-07-13T06:49:00.696279+0000 mgr.controller-0 (mgr.64108) 15369 : cluster [DBG] pgmap v15323: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail Jul 13 06:49:03 controller-0 conmon[353380]: debug 2023-07-13T06:49:03.943+0000 7f8f9f84e700 0 mon.controller-0@2(peon) e2 handle_command mon_command({"prefix": "config dump", "format": "json"} v 0) v1 Jul 13 06:49:03 controller-0 conmon[353380]: debug 2023-07-13T06:49:03.943+0000 7f8f9f84e700 0 log_channel(audit) log [DBG] : from='mgr.64108 [fd00:fd00:fd00:3000::277]:0/4282114632' entity='mgr.controller-0' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch Jul 13 06:49:04 controller-0 conmon[353380]: cluster 2023-07-13T06:49:02.696968+0000 mgr.controller-0 (mgr.64108) 15370 : cluster [DBG] pgmap v15324: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail Jul 13 06:49:04 controller-0 conmon[353380]: audit 2023-07-13T06:49:03.945045+0000 mon.controller-0 (mon.2) 7472 : audit [DBG] from='mgr.64108 [fd00:fd00:fd00:3000::277]:0/4282114632' entity='mgr.controller-0' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch Jul 13 06:49:04 controller-0 conmon[353380]: debug 2023-07-13T06:49:04.811+0000 7f8f9f84e700 0 mon.controller-0@2(peon) e2 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/osd_remove_queue}] v 0) v1 Jul 13 06:49:05 controller-0 conmon[353380]: cluster 2023-07-13T06:49:04.697700+0000 mgr.controller-0 (mgr.64108) 15371 : cluster [DBG] pgmap v15325: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail Jul 13 06:49:05 controller-0 conmon[353380]: audit 2023-07-13T06:49:04.822347+0000 mon.controller-2 (mon.0) 10302 : audit [INF] from='mgr.64108 ' entity='mgr.controller-0' Jul 13 06:49:06 controller-0 conmon[353380]: debug 2023-07-13T06:49:06.422+0000 7f8fa2053700 1 mon.controller-0@2(peon).osd e226 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 71303168 kv_alloc: 872415232 Jul 13 06:49:07 controller-0 conmon[353380]: cluster 2023-07-13T06:49:06.698819+0000 mgr.controller-0 (mgr.64108) 15372 : cluster [DBG] pgmap v15326: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail ● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5 Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2023-07-12 22:24:37 UTC; 8h ago Main PID: 373192 (conmon) Tasks: 4 (limit: 100744) Memory: 7.9M CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service ├─container │ ├─373205 /dev/init -- /usr/bin/ceph-crash -n client.crash.controller-0 │ └─373237 /usr/libexec/platform-python -s /usr/bin/ceph-crash -n client.crash.controller-0 └─supervisor └─373192 /usr/bin/conmon --api-version 1 -c 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 -u 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156> Jul 12 22:24:36 controller-0 systemd[1]: Starting Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5... Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.193182476 +0000 UTC m=+0.128684558 container create 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, > Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.137771526 +0000 UTC m=+0.073273609 image pull undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.382231705 +0000 UTC m=+0.317733796 container init 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, na> Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.408185636 +0000 UTC m=+0.343687734 container start 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, n> Jul 12 22:24:37 controller-0 bash[372881]: 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 Jul 12 22:24:37 controller-0 systemd[1]: Started Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5. Jul 12 22:24:37 controller-0 conmon[373192]: INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s ● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph mgr.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5 Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2023-07-12 22:20:34 UTC; 8h ago Main PID: 356559 (conmon) Tasks: 113 (limit: 100744) Memory: 465.5M CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service ├─container │ ├─356572 /dev/init -- /usr/bin/ceph-mgr -n mgr.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug │ ├─356584 /usr/bin/ceph-mgr -n mgr.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug │ ├─356901 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.30 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))" │ ├─356902 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.32 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))" │ ├─356903 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.13 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))" │ ├─356904 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.35 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))" │ ├─356906 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.36 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))" │ └─356907 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.16 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))" └─supervisor └─356559 /usr/bin/conmon --api-version 1 -c de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f4117 -u de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f4117 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f> Jul 13 06:49:00 controller-0 conmon[356559]: debug 2023-07-13T06:49:00.695+0000 7f8fa5ee4700 0 log_channel(cluster) log [DBG] : pgmap v15323: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail Jul 13 06:49:01 controller-0 conmon[356559]: debug 2023-07-13T06:49:01.837+0000 7f8f98c4a700 0 [progress INFO root] Processing OSDMap change 226..226 Jul 13 06:49:02 controller-0 conmon[356559]: debug 2023-07-13T06:49:02.695+0000 7f8fa5ee4700 0 log_channel(cluster) log [DBG] : pgmap v15324: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.696+0000 7f8fa5ee4700 0 log_channel(cluster) log [DBG] : pgmap v15325: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.824+0000 7f8f9ccd2700 0 [progress WARNING root] complete: ev fa1ba7ce-42cf-4a59-b20b-cd803b331e20 does not exist Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.825+0000 7f8f9ccd2700 0 [progress WARNING root] complete: ev 4df6b9cd-ea4b-45e7-8504-0f3efa3534e6 does not exist Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.826+0000 7f8f9ccd2700 0 [progress WARNING root] complete: ev 82594c29-a3e5-4169-878f-5e82ddbb427f does not exist Jul 13 06:49:06 controller-0 conmon[356559]: debug 2023-07-13T06:49:06.697+0000 7f8fa5ee4700 0 log_channel(cluster) log [DBG] : pgmap v15326: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail Jul 13 06:49:06 controller-0 conmon[356559]: debug 2023-07-13T06:49:06.841+0000 7f8f98c4a700 0 [progress INFO root] Processing OSDMap change 226..226 Jul 13 06:49:08 controller-0 conmon[356559]: debug 2023-07-13T06:49:08.698+0000 7f8fa5ee4700 0 log_channel(cluster) log [DBG] : pgmap v15327: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
It seems that there is a workaround: @controller-0 $ sudo cephadm shell -- ceph mgr fail controller-0 controller-0 is due to that it is the active mgr showing on the ceph status BTW, moving this to DFG:Storage
Possile workaround https://review.opendev.org/c/openstack/tripleo-ansible/+/887565
Tested the fix in regular ceph deployments and no regression observed. We only need to verify in different upgrade scenarios
FWIW I did not encounter this issue in OSPdO initially. When I added DeployedCeph: true I hit this issue. Removing it again resolved the issue. Discussing on slack with fultonj yesterday "you should be setting DeployedCeph: false... It means you used tripleo client to deploy ceph"
(In reply to Ollie Walsh from comment #9) > FWIW I did not encounter this issue in OSPdO initially. When I added > DeployedCeph: true I hit this issue. Removing it again resolved the issue. > > Discussing on slack with fultonj yesterday "you should be setting > DeployedCeph: false... It means you used tripleo client to deploy ceph" But OSPdO is a special case which is using Heat to trigger cephadm because it's not using the python tripleo client.
Here are the details on what DeployedCeph does: https://github.com/openstack/tripleo-ansible/commit/35e455ad52751ae334a614a03c8257da2f30e038 https://github.com/openstack/tripleo-heat-templates/commit/feb93b26b42673d1958917603f2498b9456fb34c
Doc text looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:5138