Bug 2223332 - Upgrade [OSP16.2 -> OSP17.1] cephadm got stuck at "ceph orch status" after ceph adoption in the "openstack overcloud upgrade" [NEEDINFO]
Summary: Upgrade [OSP16.2 -> OSP17.1] cephadm got stuck at "ceph orch status" after ce...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 6.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 6.2
Assignee: Adam King
QA Contact: Mohit Bisht
URL:
Whiteboard:
Depends On:
Blocks: 2160009 2222589
TreeView+ depends on / blocked
 
Reported: 2023-07-17 11:56 UTC by Manoj Katari
Modified: 2025-08-28 10:18 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2222589
Environment:
Last Closed: 2025-08-28 10:18:57 UTC
Embargoed:
adking: needinfo? (gfidente)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7017 0 None None None 2023-07-17 11:57:50 UTC

Description Manoj Katari 2023-07-17 11:56:50 UTC
+++ This bug was initially created as a clone of Bug #2222589 +++

On the FFU upgrade from osp16.2 to osp17.1 the openstack overcloud upgrade got stuck for more than 7 hours.

From overcloud_upgrade_run-computehci-0,computehci-1,computehci-2,controller-0,controller-1,controller-2,database-0,database-1,database-2,messaging-0,messaging-1,messaging-2,networker-0,networker-1,undercloud.log:

2023-07-12 23:18:44 | 2023-07-12 23:18:44.799776 | 525400d7-420c-ee98-3fcf-00000001a8f7 |         OK | Notify user about upcoming cephadm execution(s) | undercloud | result={^M
2023-07-12 23:18:44 |     "changed": false,^M
2023-07-12 23:18:44 |     "msg": "Running 1 cephadm playbook(s) (immediate log at /home/stack/overcloud-deploy/qe-Cloud-0/config-download/qe-Cloud-0/cephadm/cephadm_command.log)"^M
2023-07-12 23:18:44 | }
2023-07-12 23:18:44 | 2023-07-12 23:18:44.802034 | 525400d7-420c-ee98-3fcf-00000001a8f7 |     TIMING | tripleo_run_cephadm : Notify user about upcoming cephadm execution(s) | undercloud | 0:29:31.578488 | 0.05s
2023-07-12 23:18:44 | 2023-07-12 23:18:44.818351 | 525400d7-420c-ee98-3fcf-00000001a8f8 |       TASK | run cephadm playbook




From /home/stack/overcloud-deploy/qe-Cloud-0/config-download/qe-Cloud-0/cephadm/cephadm_command.log:

2023-07-12 23:19:26,778 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.777611 | 525400d7-420c-9c3f-b529-0000000001aa |       TASK | Get ceph_cli
2023-07-12 23:19:26,830 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.830146 | b6978dfe-0e19-493b-b2ec-c3f5c0f2d291 |   INCLUDED | /usr/share/ansible/roles/tripleo_cephadm/tasks/ceph_cli.yaml | controller-0
2023-07-12 23:19:26,853 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.852367 | 525400d7-420c-9c3f-b529-00000000037a |       TASK | Set ceph CLI
2023-07-12 23:19:26,914 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.913226 | 525400d7-420c-9c3f-b529-00000000037a |         OK | Set ceph CLI | controller-0
2023-07-12 23:19:26,936 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.935348 | 525400d7-420c-9c3f-b529-0000000001ab |       TASK | Get the ceph orchestrator status


Executing the following commands did got any result and it got stuck:
tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch status 
Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5
Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e

--- Additional comment from Juan Badia Payno on 2023-07-13 08:11:47 UTC ---

The Jenkins Job https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/upgrades/view/ffu/job/DFG-upgrades-ffu-17.1-from-16.2-passed_phase2-3cont_3db_3msg_2net_3hci-ipv6-ovs_dvr/79

--- Additional comment from Juan Badia Payno on 2023-07-13 08:13:32 UTC ---

CEPH STATUS
===========
tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph status 
Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5
Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
  cluster:
    id:     1606aa1c-08f8-4e53-9b34-3c74181f65d5
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim
 
  services:
    mon: 3 daemons, quorum controller-2,controller-1,controller-0 (age 8h)
    mgr: controller-0(active, since 8h), standbys: controller-1, controller-2
    osd: 15 osds: 15 up (since 8h), 15 in (since 6w)
 
  data:
    pools:   5 pools, 513 pgs
    objects: 584 objects, 1.8 GiB
    usage:   5.3 GiB used, 475 GiB / 480 GiB avail
    pgs:     513 active+clean
 

ERROR LOGS
===========
[tripleo-admin@controller-0 ~]$ sudo  grep -i health -r /var/log/ceph/
[tripleo-admin@controller-0 ~]$  grep -i error -r /var/log/ceph/*
grep: /var/log/ceph/1606aa1c-08f8-4e53-9b34-3c74181f65d5: Permission denied
/var/log/ceph/cephadm.log:2023-07-12 22:24:34,883 7fe016b5eb80 DEBUG /bin/podman: Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0
/var/log/ceph/cephadm.log:2023-07-12 22:24:34,888 7fe016b5eb80 INFO /bin/podman: stderr Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0
/var/log/ceph/cephadm.log:2023-07-12 22:24:35,000 7fe016b5eb80 DEBUG /bin/podman: Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash.controller-0
/var/log/ceph/cephadm.log:2023-07-12 22:24:35,007 7fe016b5eb80 INFO /bin/podman: stderr Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash.controller-0


CONTAINERS
===========
[tripleo-admin@controller-0 ~]$ sudo podman ps 
CONTAINER ID  IMAGE                                                                                                                           COMMAND               CREATED      STATUS           PORTS       NAMES
824cbd62da39  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-horizon:16.2_20230526.1                                       kolla_start           6 weeks ago  Up 10 hours ago              horizon
f3d03c58ed52  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.2_20230526.1                                        kolla_start           6 weeks ago  Up 10 hours ago              iscsid
65971615e059  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-keystone:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              keystone
ce0f1e5ffa58  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.2_20230526.1                            kolla_start           6 weeks ago  Up 10 hours ago              swift_object_expirer
35282fd69a26  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_replicator
5c5cf10e91ff  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_server
a2dddd6bccd6  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_updater
92915cbcf7fa  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_rsync
c618e98e8a97  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              cinder_api
f655a318253d  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              cinder_api_cron
eefb74ccd83b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-scheduler:16.2_20230526.1                              kolla_start           6 weeks ago  Up 10 hours ago              cinder_scheduler
e7dd5a227e0e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              heat_api
d4d492f3e803  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api-cfn:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              heat_api_cfn
d66201008e35  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              heat_api_cron
3544c63637b4  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-engine:16.2_20230526.1                                   kolla_start           6 weeks ago  Up 10 hours ago              heat_engine
8ff76e38752a  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cron:16.2_20230526.1                                          kolla_start           6 weeks ago  Up 10 hours ago              logrotate_crond
9dbba7d4b2f2  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-server:16.2_20230526.1                                kolla_start           6 weeks ago  Up 10 hours ago              neutron_api
0d84f6f18a2f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:16.2_20230526.1                                kolla_start           6 weeks ago  Up 10 hours ago              nova_conductor
5b6e51882fc1  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-scheduler:16.2_20230526.1                                kolla_start           6 weeks ago  Up 10 hours ago              nova_scheduler
5f9258688351  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-novncproxy:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              nova_vnc_proxy
f91655ecdba7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_auditor
942156362e53  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_reaper
84fd9898003e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_replicator
1bb99f2125f7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_server
599c4d4fc6e7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_auditor
2e2cc698178b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_replicator
9422624a67e7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_server
381786cee25d  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_updater
36ccd0726ebc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_auditor
9568a44ca416  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-placement-api:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              placement_api
9b1c694cf8f0  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              glance_api
98bb550c241a  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              glance_api_cron
f5462a1ba132  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              nova_api
50f422579832  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              nova_metadata
a7c7854b5c03  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.2_20230526.1                            kolla_start           6 weeks ago  Up 10 hours ago              swift_proxy
2f8ca8cc4d10  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              nova_api_cron
4999273b993b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404                                                                    -n mon.controller...  8 hours ago  Up 8 hours ago               ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-mon-controller-0
de6b573beadc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404                                                                    -n mgr.controller...  8 hours ago  Up 8 hours ago               ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-mgr-controller-0
985cc3701391  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n client.crash.c...  8 hours ago  Up 8 hours ago               ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0
63f9724035c9  cluster.common.tag/haproxy:pcmklatest                                                                                           /bin/bash /usr/lo...  8 hours ago  Up 8 hours ago               haproxy-bundle-podman-0
002ad445b88b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-memcached:17.1_20230711.2                                     kolla_start           7 hours ago  Up 7 hours ago               memcached
bd20fa130fdc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404                                                                    --fsid 1606aa1c-0...  7 hours ago  Up 7 hours ago               ecstatic_napier


CEPH HEALTH
===========
[tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph health detail 
Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5
Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
HEALTH_WARN mons are allowing insecure global_id reclaim
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim
    mon.controller-2 has auth_allow_insecure_global_id_reclaim set to true
    mon.controller-1 has auth_allow_insecure_global_id_reclaim set to true
    mon.controller-0 has auth_allow_insecure_global_id_reclaim set to true


SYSTEMCTL
=========
[tripleo-admin@controller-0 ~]$ sudo systemctl status ceph\*.service
● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph mon.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5
   Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-07-12 22:19:45 UTC; 8h ago
 Main PID: 353380 (conmon)
    Tasks: 30 (limit: 100744)
   Memory: 925.2M
   CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service
           ├─container
           │ ├─353392 /dev/init -- /usr/bin/ceph-mon -n mon.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
           │ └─353405 /usr/bin/ceph-mon -n mon.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
           └─supervisor
             └─353380 /usr/bin/conmon --api-version 1 -c 4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fdbfcc -u 4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fdbfcc -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fd>

Jul 13 06:49:02 controller-0 conmon[353380]: cluster 2023-07-13T06:49:00.696279+0000 mgr.controller-0 (mgr.64108) 15369 : cluster [DBG] pgmap v15323: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:03 controller-0 conmon[353380]: debug 2023-07-13T06:49:03.943+0000 7f8f9f84e700  0 mon.controller-0@2(peon) e2 handle_command mon_command({"prefix": "config dump", "format": "json"} v 0) v1
Jul 13 06:49:03 controller-0 conmon[353380]: debug 2023-07-13T06:49:03.943+0000 7f8f9f84e700  0 log_channel(audit) log [DBG] : from='mgr.64108 [fd00:fd00:fd00:3000::277]:0/4282114632' entity='mgr.controller-0' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
Jul 13 06:49:04 controller-0 conmon[353380]: cluster 2023-07-13T06:49:02.696968+0000 mgr.controller-0 (mgr.64108) 15370 : cluster [DBG] pgmap v15324: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:04 controller-0 conmon[353380]: audit 2023-07-13T06:49:03.945045+0000 mon.controller-0 (mon.2) 7472 : audit [DBG] from='mgr.64108 [fd00:fd00:fd00:3000::277]:0/4282114632' entity='mgr.controller-0' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
Jul 13 06:49:04 controller-0 conmon[353380]: debug 2023-07-13T06:49:04.811+0000 7f8f9f84e700  0 mon.controller-0@2(peon) e2 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/osd_remove_queue}] v 0) v1
Jul 13 06:49:05 controller-0 conmon[353380]: cluster 2023-07-13T06:49:04.697700+0000 mgr.controller-0 (mgr.64108) 15371 : cluster [DBG] pgmap v15325: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:05 controller-0 conmon[353380]: audit 2023-07-13T06:49:04.822347+0000 mon.controller-2 (mon.0) 10302 : audit [INF] from='mgr.64108 ' entity='mgr.controller-0' 
Jul 13 06:49:06 controller-0 conmon[353380]: debug 2023-07-13T06:49:06.422+0000 7f8fa2053700  1 mon.controller-0@2(peon).osd e226 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 71303168 kv_alloc: 872415232
Jul 13 06:49:07 controller-0 conmon[353380]: cluster 2023-07-13T06:49:06.698819+0000 mgr.controller-0 (mgr.64108) 15372 : cluster [DBG] pgmap v15326: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail

● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5
   Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-07-12 22:24:37 UTC; 8h ago
 Main PID: 373192 (conmon)
    Tasks: 4 (limit: 100744)
   Memory: 7.9M
   CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service
           ├─container
           │ ├─373205 /dev/init -- /usr/bin/ceph-crash -n client.crash.controller-0
           │ └─373237 /usr/libexec/platform-python -s /usr/bin/ceph-crash -n client.crash.controller-0
           └─supervisor
             └─373192 /usr/bin/conmon --api-version 1 -c 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 -u 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156>

Jul 12 22:24:36 controller-0 systemd[1]: Starting Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5...
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.193182476 +0000 UTC m=+0.128684558 container create 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, >
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.137771526 +0000 UTC m=+0.073273609 image pull  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.382231705 +0000 UTC m=+0.317733796 container init 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, na>
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.408185636 +0000 UTC m=+0.343687734 container start 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, n>
Jul 12 22:24:37 controller-0 bash[372881]: 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725
Jul 12 22:24:37 controller-0 systemd[1]: Started Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5.
Jul 12 22:24:37 controller-0 conmon[373192]: INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s

● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph mgr.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5
   Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-07-12 22:20:34 UTC; 8h ago
 Main PID: 356559 (conmon)
    Tasks: 113 (limit: 100744)
   Memory: 465.5M
   CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service
           ├─container
           │ ├─356572 /dev/init -- /usr/bin/ceph-mgr -n mgr.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug 
           │ ├─356584 /usr/bin/ceph-mgr -n mgr.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug 
           │ ├─356901 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.30 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356902 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.32 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356903 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.13 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356904 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.35 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356906 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.36 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ └─356907 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.16 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           └─supervisor
             └─356559 /usr/bin/conmon --api-version 1 -c de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f4117 -u de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f4117 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f>

Jul 13 06:49:00 controller-0 conmon[356559]: debug 2023-07-13T06:49:00.695+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15323: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:01 controller-0 conmon[356559]: debug 2023-07-13T06:49:01.837+0000 7f8f98c4a700  0 [progress INFO root] Processing OSDMap change 226..226
Jul 13 06:49:02 controller-0 conmon[356559]: debug 2023-07-13T06:49:02.695+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15324: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.696+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15325: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.824+0000 7f8f9ccd2700  0 [progress WARNING root] complete: ev fa1ba7ce-42cf-4a59-b20b-cd803b331e20 does not exist
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.825+0000 7f8f9ccd2700  0 [progress WARNING root] complete: ev 4df6b9cd-ea4b-45e7-8504-0f3efa3534e6 does not exist
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.826+0000 7f8f9ccd2700  0 [progress WARNING root] complete: ev 82594c29-a3e5-4169-878f-5e82ddbb427f does not exist
Jul 13 06:49:06 controller-0 conmon[356559]: debug 2023-07-13T06:49:06.697+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15326: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:06 controller-0 conmon[356559]: debug 2023-07-13T06:49:06.841+0000 7f8f98c4a700  0 [progress INFO root] Processing OSDMap change 226..226
Jul 13 06:49:08 controller-0 conmon[356559]: debug 2023-07-13T06:49:08.698+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15327: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail

--- Additional comment from Juan Badia Payno on 2023-07-13 08:25:11 UTC ---

It seems that there is a workaround:
 @controller-0 $ sudo cephadm shell -- ceph mgr fail controller-0

controller-0 is due to that it is the active mgr showing on the ceph status

BTW, moving this to DFG:Storage

--- Additional comment from Juan Badia Payno on 2023-07-17 07:54:11 UTC ---

Possile workaround https://review.opendev.org/c/openstack/tripleo-ansible/+/887565

Comment 1 RHEL Program Management 2023-07-17 11:57:00 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Scott Ostapovicz 2023-07-17 14:38:53 UTC
Missed the window for 6.1 z1. Retargeting to 6.1 z2.

Comment 6 Scott Ostapovicz 2023-07-24 13:44:32 UTC
Adam it looks like you are already on top of this.  Please retarget it to 5.3 z5 if and when it is determined a change is actually needed in a 5.x release.

Comment 10 Sahina Bose 2025-08-28 10:18:57 UTC
Closing this bug as part of bulk closing of bugs that have been open for more than 2 years without any significant updates for the last 3 months. Please reopen with justification if you think this bug is still relevant and needs to be addressed in an upcoming release


Note You need to log in before you can comment on or make changes to this bug.