Bug 2223332 - Upgrade [OSP16.2 -> OSP17.1] cephadm got stuck at "ceph orch status" after ceph adoption in the "openstack overcloud upgrade" [NEEDINFO]
Summary: Upgrade [OSP16.2 -> OSP17.1] cephadm got stuck at "ceph orch status" after ce...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 6.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 6.2
Assignee: Adam King
QA Contact: Mohit Bisht
URL:
Whiteboard:
Depends On:
Blocks: 2160009 2222589
TreeView+ depends on / blocked
 
Reported: 2023-07-17 11:56 UTC by Manoj Katari
Modified: 2025-05-15 05:11 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2222589
Environment:
Last Closed:
Embargoed:
adking: needinfo? (gfidente)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7017 0 None None None 2023-07-17 11:57:50 UTC

Description Manoj Katari 2023-07-17 11:56:50 UTC
+++ This bug was initially created as a clone of Bug #2222589 +++

On the FFU upgrade from osp16.2 to osp17.1 the openstack overcloud upgrade got stuck for more than 7 hours.

From overcloud_upgrade_run-computehci-0,computehci-1,computehci-2,controller-0,controller-1,controller-2,database-0,database-1,database-2,messaging-0,messaging-1,messaging-2,networker-0,networker-1,undercloud.log:

2023-07-12 23:18:44 | 2023-07-12 23:18:44.799776 | 525400d7-420c-ee98-3fcf-00000001a8f7 |         OK | Notify user about upcoming cephadm execution(s) | undercloud | result={^M
2023-07-12 23:18:44 |     "changed": false,^M
2023-07-12 23:18:44 |     "msg": "Running 1 cephadm playbook(s) (immediate log at /home/stack/overcloud-deploy/qe-Cloud-0/config-download/qe-Cloud-0/cephadm/cephadm_command.log)"^M
2023-07-12 23:18:44 | }
2023-07-12 23:18:44 | 2023-07-12 23:18:44.802034 | 525400d7-420c-ee98-3fcf-00000001a8f7 |     TIMING | tripleo_run_cephadm : Notify user about upcoming cephadm execution(s) | undercloud | 0:29:31.578488 | 0.05s
2023-07-12 23:18:44 | 2023-07-12 23:18:44.818351 | 525400d7-420c-ee98-3fcf-00000001a8f8 |       TASK | run cephadm playbook




From /home/stack/overcloud-deploy/qe-Cloud-0/config-download/qe-Cloud-0/cephadm/cephadm_command.log:

2023-07-12 23:19:26,778 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.777611 | 525400d7-420c-9c3f-b529-0000000001aa |       TASK | Get ceph_cli
2023-07-12 23:19:26,830 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.830146 | b6978dfe-0e19-493b-b2ec-c3f5c0f2d291 |   INCLUDED | /usr/share/ansible/roles/tripleo_cephadm/tasks/ceph_cli.yaml | controller-0
2023-07-12 23:19:26,853 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.852367 | 525400d7-420c-9c3f-b529-00000000037a |       TASK | Set ceph CLI
2023-07-12 23:19:26,914 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.913226 | 525400d7-420c-9c3f-b529-00000000037a |         OK | Set ceph CLI | controller-0
2023-07-12 23:19:26,936 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.935348 | 525400d7-420c-9c3f-b529-0000000001ab |       TASK | Get the ceph orchestrator status


Executing the following commands did got any result and it got stuck:
tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch status 
Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5
Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e

--- Additional comment from Juan Badia Payno on 2023-07-13 08:11:47 UTC ---

The Jenkins Job https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/upgrades/view/ffu/job/DFG-upgrades-ffu-17.1-from-16.2-passed_phase2-3cont_3db_3msg_2net_3hci-ipv6-ovs_dvr/79

--- Additional comment from Juan Badia Payno on 2023-07-13 08:13:32 UTC ---

CEPH STATUS
===========
tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph status 
Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5
Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
  cluster:
    id:     1606aa1c-08f8-4e53-9b34-3c74181f65d5
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim
 
  services:
    mon: 3 daemons, quorum controller-2,controller-1,controller-0 (age 8h)
    mgr: controller-0(active, since 8h), standbys: controller-1, controller-2
    osd: 15 osds: 15 up (since 8h), 15 in (since 6w)
 
  data:
    pools:   5 pools, 513 pgs
    objects: 584 objects, 1.8 GiB
    usage:   5.3 GiB used, 475 GiB / 480 GiB avail
    pgs:     513 active+clean
 

ERROR LOGS
===========
[tripleo-admin@controller-0 ~]$ sudo  grep -i health -r /var/log/ceph/
[tripleo-admin@controller-0 ~]$  grep -i error -r /var/log/ceph/*
grep: /var/log/ceph/1606aa1c-08f8-4e53-9b34-3c74181f65d5: Permission denied
/var/log/ceph/cephadm.log:2023-07-12 22:24:34,883 7fe016b5eb80 DEBUG /bin/podman: Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0
/var/log/ceph/cephadm.log:2023-07-12 22:24:34,888 7fe016b5eb80 INFO /bin/podman: stderr Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0
/var/log/ceph/cephadm.log:2023-07-12 22:24:35,000 7fe016b5eb80 DEBUG /bin/podman: Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash.controller-0
/var/log/ceph/cephadm.log:2023-07-12 22:24:35,007 7fe016b5eb80 INFO /bin/podman: stderr Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash.controller-0


CONTAINERS
===========
[tripleo-admin@controller-0 ~]$ sudo podman ps 
CONTAINER ID  IMAGE                                                                                                                           COMMAND               CREATED      STATUS           PORTS       NAMES
824cbd62da39  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-horizon:16.2_20230526.1                                       kolla_start           6 weeks ago  Up 10 hours ago              horizon
f3d03c58ed52  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.2_20230526.1                                        kolla_start           6 weeks ago  Up 10 hours ago              iscsid
65971615e059  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-keystone:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              keystone
ce0f1e5ffa58  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.2_20230526.1                            kolla_start           6 weeks ago  Up 10 hours ago              swift_object_expirer
35282fd69a26  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_replicator
5c5cf10e91ff  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_server
a2dddd6bccd6  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_updater
92915cbcf7fa  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_rsync
c618e98e8a97  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              cinder_api
f655a318253d  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              cinder_api_cron
eefb74ccd83b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-scheduler:16.2_20230526.1                              kolla_start           6 weeks ago  Up 10 hours ago              cinder_scheduler
e7dd5a227e0e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              heat_api
d4d492f3e803  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api-cfn:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              heat_api_cfn
d66201008e35  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              heat_api_cron
3544c63637b4  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-engine:16.2_20230526.1                                   kolla_start           6 weeks ago  Up 10 hours ago              heat_engine
8ff76e38752a  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cron:16.2_20230526.1                                          kolla_start           6 weeks ago  Up 10 hours ago              logrotate_crond
9dbba7d4b2f2  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-server:16.2_20230526.1                                kolla_start           6 weeks ago  Up 10 hours ago              neutron_api
0d84f6f18a2f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:16.2_20230526.1                                kolla_start           6 weeks ago  Up 10 hours ago              nova_conductor
5b6e51882fc1  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-scheduler:16.2_20230526.1                                kolla_start           6 weeks ago  Up 10 hours ago              nova_scheduler
5f9258688351  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-novncproxy:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              nova_vnc_proxy
f91655ecdba7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_auditor
942156362e53  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_reaper
84fd9898003e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_replicator
1bb99f2125f7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_server
599c4d4fc6e7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_auditor
2e2cc698178b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_replicator
9422624a67e7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_server
381786cee25d  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_updater
36ccd0726ebc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_auditor
9568a44ca416  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-placement-api:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              placement_api
9b1c694cf8f0  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              glance_api
98bb550c241a  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              glance_api_cron
f5462a1ba132  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              nova_api
50f422579832  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              nova_metadata
a7c7854b5c03  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.2_20230526.1                            kolla_start           6 weeks ago  Up 10 hours ago              swift_proxy
2f8ca8cc4d10  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              nova_api_cron
4999273b993b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404                                                                    -n mon.controller...  8 hours ago  Up 8 hours ago               ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-mon-controller-0
de6b573beadc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404                                                                    -n mgr.controller...  8 hours ago  Up 8 hours ago               ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-mgr-controller-0
985cc3701391  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n client.crash.c...  8 hours ago  Up 8 hours ago               ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0
63f9724035c9  cluster.common.tag/haproxy:pcmklatest                                                                                           /bin/bash /usr/lo...  8 hours ago  Up 8 hours ago               haproxy-bundle-podman-0
002ad445b88b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-memcached:17.1_20230711.2                                     kolla_start           7 hours ago  Up 7 hours ago               memcached
bd20fa130fdc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404                                                                    --fsid 1606aa1c-0...  7 hours ago  Up 7 hours ago               ecstatic_napier


CEPH HEALTH
===========
[tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph health detail 
Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5
Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
HEALTH_WARN mons are allowing insecure global_id reclaim
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim
    mon.controller-2 has auth_allow_insecure_global_id_reclaim set to true
    mon.controller-1 has auth_allow_insecure_global_id_reclaim set to true
    mon.controller-0 has auth_allow_insecure_global_id_reclaim set to true


SYSTEMCTL
=========
[tripleo-admin@controller-0 ~]$ sudo systemctl status ceph\*.service
● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph mon.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5
   Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-07-12 22:19:45 UTC; 8h ago
 Main PID: 353380 (conmon)
    Tasks: 30 (limit: 100744)
   Memory: 925.2M
   CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service
           ├─container
           │ ├─353392 /dev/init -- /usr/bin/ceph-mon -n mon.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
           │ └─353405 /usr/bin/ceph-mon -n mon.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
           └─supervisor
             └─353380 /usr/bin/conmon --api-version 1 -c 4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fdbfcc -u 4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fdbfcc -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fd>

Jul 13 06:49:02 controller-0 conmon[353380]: cluster 2023-07-13T06:49:00.696279+0000 mgr.controller-0 (mgr.64108) 15369 : cluster [DBG] pgmap v15323: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:03 controller-0 conmon[353380]: debug 2023-07-13T06:49:03.943+0000 7f8f9f84e700  0 mon.controller-0@2(peon) e2 handle_command mon_command({"prefix": "config dump", "format": "json"} v 0) v1
Jul 13 06:49:03 controller-0 conmon[353380]: debug 2023-07-13T06:49:03.943+0000 7f8f9f84e700  0 log_channel(audit) log [DBG] : from='mgr.64108 [fd00:fd00:fd00:3000::277]:0/4282114632' entity='mgr.controller-0' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
Jul 13 06:49:04 controller-0 conmon[353380]: cluster 2023-07-13T06:49:02.696968+0000 mgr.controller-0 (mgr.64108) 15370 : cluster [DBG] pgmap v15324: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:04 controller-0 conmon[353380]: audit 2023-07-13T06:49:03.945045+0000 mon.controller-0 (mon.2) 7472 : audit [DBG] from='mgr.64108 [fd00:fd00:fd00:3000::277]:0/4282114632' entity='mgr.controller-0' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
Jul 13 06:49:04 controller-0 conmon[353380]: debug 2023-07-13T06:49:04.811+0000 7f8f9f84e700  0 mon.controller-0@2(peon) e2 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/osd_remove_queue}] v 0) v1
Jul 13 06:49:05 controller-0 conmon[353380]: cluster 2023-07-13T06:49:04.697700+0000 mgr.controller-0 (mgr.64108) 15371 : cluster [DBG] pgmap v15325: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:05 controller-0 conmon[353380]: audit 2023-07-13T06:49:04.822347+0000 mon.controller-2 (mon.0) 10302 : audit [INF] from='mgr.64108 ' entity='mgr.controller-0' 
Jul 13 06:49:06 controller-0 conmon[353380]: debug 2023-07-13T06:49:06.422+0000 7f8fa2053700  1 mon.controller-0@2(peon).osd e226 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 71303168 kv_alloc: 872415232
Jul 13 06:49:07 controller-0 conmon[353380]: cluster 2023-07-13T06:49:06.698819+0000 mgr.controller-0 (mgr.64108) 15372 : cluster [DBG] pgmap v15326: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail

● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5
   Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-07-12 22:24:37 UTC; 8h ago
 Main PID: 373192 (conmon)
    Tasks: 4 (limit: 100744)
   Memory: 7.9M
   CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service
           ├─container
           │ ├─373205 /dev/init -- /usr/bin/ceph-crash -n client.crash.controller-0
           │ └─373237 /usr/libexec/platform-python -s /usr/bin/ceph-crash -n client.crash.controller-0
           └─supervisor
             └─373192 /usr/bin/conmon --api-version 1 -c 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 -u 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156>

Jul 12 22:24:36 controller-0 systemd[1]: Starting Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5...
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.193182476 +0000 UTC m=+0.128684558 container create 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, >
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.137771526 +0000 UTC m=+0.073273609 image pull  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.382231705 +0000 UTC m=+0.317733796 container init 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, na>
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.408185636 +0000 UTC m=+0.343687734 container start 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, n>
Jul 12 22:24:37 controller-0 bash[372881]: 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725
Jul 12 22:24:37 controller-0 systemd[1]: Started Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5.
Jul 12 22:24:37 controller-0 conmon[373192]: INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s

● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph mgr.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5
   Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-07-12 22:20:34 UTC; 8h ago
 Main PID: 356559 (conmon)
    Tasks: 113 (limit: 100744)
   Memory: 465.5M
   CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service
           ├─container
           │ ├─356572 /dev/init -- /usr/bin/ceph-mgr -n mgr.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug 
           │ ├─356584 /usr/bin/ceph-mgr -n mgr.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug 
           │ ├─356901 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.30 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356902 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.32 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356903 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.13 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356904 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.35 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356906 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.36 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ └─356907 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.16 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           └─supervisor
             └─356559 /usr/bin/conmon --api-version 1 -c de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f4117 -u de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f4117 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f>

Jul 13 06:49:00 controller-0 conmon[356559]: debug 2023-07-13T06:49:00.695+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15323: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:01 controller-0 conmon[356559]: debug 2023-07-13T06:49:01.837+0000 7f8f98c4a700  0 [progress INFO root] Processing OSDMap change 226..226
Jul 13 06:49:02 controller-0 conmon[356559]: debug 2023-07-13T06:49:02.695+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15324: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.696+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15325: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.824+0000 7f8f9ccd2700  0 [progress WARNING root] complete: ev fa1ba7ce-42cf-4a59-b20b-cd803b331e20 does not exist
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.825+0000 7f8f9ccd2700  0 [progress WARNING root] complete: ev 4df6b9cd-ea4b-45e7-8504-0f3efa3534e6 does not exist
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.826+0000 7f8f9ccd2700  0 [progress WARNING root] complete: ev 82594c29-a3e5-4169-878f-5e82ddbb427f does not exist
Jul 13 06:49:06 controller-0 conmon[356559]: debug 2023-07-13T06:49:06.697+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15326: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:06 controller-0 conmon[356559]: debug 2023-07-13T06:49:06.841+0000 7f8f98c4a700  0 [progress INFO root] Processing OSDMap change 226..226
Jul 13 06:49:08 controller-0 conmon[356559]: debug 2023-07-13T06:49:08.698+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15327: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail

--- Additional comment from Juan Badia Payno on 2023-07-13 08:25:11 UTC ---

It seems that there is a workaround:
 @controller-0 $ sudo cephadm shell -- ceph mgr fail controller-0

controller-0 is due to that it is the active mgr showing on the ceph status

BTW, moving this to DFG:Storage

--- Additional comment from Juan Badia Payno on 2023-07-17 07:54:11 UTC ---

Possile workaround https://review.opendev.org/c/openstack/tripleo-ansible/+/887565

Comment 1 RHEL Program Management 2023-07-17 11:57:00 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Scott Ostapovicz 2023-07-17 14:38:53 UTC
Missed the window for 6.1 z1. Retargeting to 6.1 z2.

Comment 6 Scott Ostapovicz 2023-07-24 13:44:32 UTC
Adam it looks like you are already on top of this.  Please retarget it to 5.3 z5 if and when it is determined a change is actually needed in a 5.x release.


Note You need to log in before you can comment on or make changes to this bug.