Bug 2222589 - Upgrade [OSP16.2 -> OSP17.1] After ceph adoption, cephadm stops at 'ceph orch status'
Summary: Upgrade [OSP16.2 -> OSP17.1] After ceph adoption, cephadm stops at 'ceph orch...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 17.1
Assignee: Manoj Katari
QA Contact: Khomesh Thakre
URL:
Whiteboard:
Depends On: 2223332
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-13 08:08 UTC by Juan Badia Payno
Modified: 2023-09-20 00:30 UTC (History)
13 users (show)

Fixed In Version: tripleo-ansible-3.3.1-1.20230518201536.el9ost
Doc Type: Bug Fix
Doc Text:
Before this update, during the upgrade from RHOSP 16.2 to 17.1, the director upgrade script stopped executing when upgrading Red Hat Ceph Storage 4 to 5 in a director-deployed Ceph Storage environment that used IPv6. This issue is resolved in RHOSP 17.1.1.
Clone Of:
: 2223332 (view as bug list)
Environment:
Last Closed: 2023-09-20 00:29:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 887565 0 None MERGED Fix ceph orch command stuck issue 2023-07-20 10:26:17 UTC
Red Hat Issue Tracker OSP-26578 0 None None None 2023-07-13 08:10:37 UTC
Red Hat Product Errata RHBA-2023:5138 0 None None None 2023-09-20 00:30:04 UTC

Description Juan Badia Payno 2023-07-13 08:08:56 UTC
On the FFU upgrade from osp16.2 to osp17.1 the openstack overcloud upgrade got stuck for more than 7 hours.

From overcloud_upgrade_run-computehci-0,computehci-1,computehci-2,controller-0,controller-1,controller-2,database-0,database-1,database-2,messaging-0,messaging-1,messaging-2,networker-0,networker-1,undercloud.log:

2023-07-12 23:18:44 | 2023-07-12 23:18:44.799776 | 525400d7-420c-ee98-3fcf-00000001a8f7 |         OK | Notify user about upcoming cephadm execution(s) | undercloud | result={^M
2023-07-12 23:18:44 |     "changed": false,^M
2023-07-12 23:18:44 |     "msg": "Running 1 cephadm playbook(s) (immediate log at /home/stack/overcloud-deploy/qe-Cloud-0/config-download/qe-Cloud-0/cephadm/cephadm_command.log)"^M
2023-07-12 23:18:44 | }
2023-07-12 23:18:44 | 2023-07-12 23:18:44.802034 | 525400d7-420c-ee98-3fcf-00000001a8f7 |     TIMING | tripleo_run_cephadm : Notify user about upcoming cephadm execution(s) | undercloud | 0:29:31.578488 | 0.05s
2023-07-12 23:18:44 | 2023-07-12 23:18:44.818351 | 525400d7-420c-ee98-3fcf-00000001a8f8 |       TASK | run cephadm playbook




From /home/stack/overcloud-deploy/qe-Cloud-0/config-download/qe-Cloud-0/cephadm/cephadm_command.log:

2023-07-12 23:19:26,778 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.777611 | 525400d7-420c-9c3f-b529-0000000001aa |       TASK | Get ceph_cli
2023-07-12 23:19:26,830 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.830146 | b6978dfe-0e19-493b-b2ec-c3f5c0f2d291 |   INCLUDED | /usr/share/ansible/roles/tripleo_cephadm/tasks/ceph_cli.yaml | controller-0
2023-07-12 23:19:26,853 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.852367 | 525400d7-420c-9c3f-b529-00000000037a |       TASK | Set ceph CLI
2023-07-12 23:19:26,914 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.913226 | 525400d7-420c-9c3f-b529-00000000037a |         OK | Set ceph CLI | controller-0
2023-07-12 23:19:26,936 p=463425 u=stack n=ansible | 2023-07-12 23:19:26.935348 | 525400d7-420c-9c3f-b529-0000000001ab |       TASK | Get the ceph orchestrator status


Executing the following commands did got any result and it got stuck:
tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch status 
Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5
Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e

Comment 2 Juan Badia Payno 2023-07-13 08:13:32 UTC
CEPH STATUS
===========
tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph status 
Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5
Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
  cluster:
    id:     1606aa1c-08f8-4e53-9b34-3c74181f65d5
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim
 
  services:
    mon: 3 daemons, quorum controller-2,controller-1,controller-0 (age 8h)
    mgr: controller-0(active, since 8h), standbys: controller-1, controller-2
    osd: 15 osds: 15 up (since 8h), 15 in (since 6w)
 
  data:
    pools:   5 pools, 513 pgs
    objects: 584 objects, 1.8 GiB
    usage:   5.3 GiB used, 475 GiB / 480 GiB avail
    pgs:     513 active+clean
 

ERROR LOGS
===========
[tripleo-admin@controller-0 ~]$ sudo  grep -i health -r /var/log/ceph/
[tripleo-admin@controller-0 ~]$  grep -i error -r /var/log/ceph/*
grep: /var/log/ceph/1606aa1c-08f8-4e53-9b34-3c74181f65d5: Permission denied
/var/log/ceph/cephadm.log:2023-07-12 22:24:34,883 7fe016b5eb80 DEBUG /bin/podman: Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0
/var/log/ceph/cephadm.log:2023-07-12 22:24:34,888 7fe016b5eb80 INFO /bin/podman: stderr Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0
/var/log/ceph/cephadm.log:2023-07-12 22:24:35,000 7fe016b5eb80 DEBUG /bin/podman: Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash.controller-0
/var/log/ceph/cephadm.log:2023-07-12 22:24:35,007 7fe016b5eb80 INFO /bin/podman: stderr Error: error inspecting object: no such container ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash.controller-0


CONTAINERS
===========
[tripleo-admin@controller-0 ~]$ sudo podman ps 
CONTAINER ID  IMAGE                                                                                                                           COMMAND               CREATED      STATUS           PORTS       NAMES
824cbd62da39  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-horizon:16.2_20230526.1                                       kolla_start           6 weeks ago  Up 10 hours ago              horizon
f3d03c58ed52  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.2_20230526.1                                        kolla_start           6 weeks ago  Up 10 hours ago              iscsid
65971615e059  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-keystone:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              keystone
ce0f1e5ffa58  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.2_20230526.1                            kolla_start           6 weeks ago  Up 10 hours ago              swift_object_expirer
35282fd69a26  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_replicator
5c5cf10e91ff  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_server
a2dddd6bccd6  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_updater
92915cbcf7fa  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_rsync
c618e98e8a97  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              cinder_api
f655a318253d  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              cinder_api_cron
eefb74ccd83b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-scheduler:16.2_20230526.1                              kolla_start           6 weeks ago  Up 10 hours ago              cinder_scheduler
e7dd5a227e0e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              heat_api
d4d492f3e803  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api-cfn:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              heat_api_cfn
d66201008e35  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              heat_api_cron
3544c63637b4  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-heat-engine:16.2_20230526.1                                   kolla_start           6 weeks ago  Up 10 hours ago              heat_engine
8ff76e38752a  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cron:16.2_20230526.1                                          kolla_start           6 weeks ago  Up 10 hours ago              logrotate_crond
9dbba7d4b2f2  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-server:16.2_20230526.1                                kolla_start           6 weeks ago  Up 10 hours ago              neutron_api
0d84f6f18a2f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:16.2_20230526.1                                kolla_start           6 weeks ago  Up 10 hours ago              nova_conductor
5b6e51882fc1  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-scheduler:16.2_20230526.1                                kolla_start           6 weeks ago  Up 10 hours ago              nova_scheduler
5f9258688351  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-novncproxy:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              nova_vnc_proxy
f91655ecdba7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_auditor
942156362e53  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_reaper
84fd9898003e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_replicator
1bb99f2125f7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              swift_account_server
599c4d4fc6e7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_auditor
2e2cc698178b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_replicator
9422624a67e7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_server
381786cee25d  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.2_20230526.1                               kolla_start           6 weeks ago  Up 10 hours ago              swift_container_updater
36ccd0726ebc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.2_20230526.1                                  kolla_start           6 weeks ago  Up 10 hours ago              swift_object_auditor
9568a44ca416  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-placement-api:16.2_20230526.1                                 kolla_start           6 weeks ago  Up 10 hours ago              placement_api
9b1c694cf8f0  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              glance_api
98bb550c241a  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20230526.1                                    kolla_start           6 weeks ago  Up 10 hours ago              glance_api_cron
f5462a1ba132  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              nova_api
50f422579832  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              nova_metadata
a7c7854b5c03  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.2_20230526.1                            kolla_start           6 weeks ago  Up 10 hours ago              swift_proxy
2f8ca8cc4d10  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.2_20230526.1                                      kolla_start           6 weeks ago  Up 10 hours ago              nova_api_cron
4999273b993b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404                                                                    -n mon.controller...  8 hours ago  Up 8 hours ago               ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-mon-controller-0
de6b573beadc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404                                                                    -n mgr.controller...  8 hours ago  Up 8 hours ago               ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-mgr-controller-0
985cc3701391  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n client.crash.c...  8 hours ago  Up 8 hours ago               ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5-crash-controller-0
63f9724035c9  cluster.common.tag/haproxy:pcmklatest                                                                                           /bin/bash /usr/lo...  8 hours ago  Up 8 hours ago               haproxy-bundle-podman-0
002ad445b88b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-memcached:17.1_20230711.2                                     kolla_start           7 hours ago  Up 7 hours ago               memcached
bd20fa130fdc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-404                                                                    --fsid 1606aa1c-0...  7 hours ago  Up 7 hours ago               ecstatic_napier


CEPH HEALTH
===========
[tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph health detail 
Inferring fsid 1606aa1c-08f8-4e53-9b34-3c74181f65d5
Using recent ceph image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
HEALTH_WARN mons are allowing insecure global_id reclaim
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim
    mon.controller-2 has auth_allow_insecure_global_id_reclaim set to true
    mon.controller-1 has auth_allow_insecure_global_id_reclaim set to true
    mon.controller-0 has auth_allow_insecure_global_id_reclaim set to true


SYSTEMCTL
=========
[tripleo-admin@controller-0 ~]$ sudo systemctl status ceph\*.service
● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph mon.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5
   Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-07-12 22:19:45 UTC; 8h ago
 Main PID: 353380 (conmon)
    Tasks: 30 (limit: 100744)
   Memory: 925.2M
   CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service
           ├─container
           │ ├─353392 /dev/init -- /usr/bin/ceph-mon -n mon.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
           │ └─353405 /usr/bin/ceph-mon -n mon.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
           └─supervisor
             └─353380 /usr/bin/conmon --api-version 1 -c 4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fdbfcc -u 4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fdbfcc -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/4999273b993b6b9d663f3b4dd1076062f679932b64a549a789b5289a06fd>

Jul 13 06:49:02 controller-0 conmon[353380]: cluster 2023-07-13T06:49:00.696279+0000 mgr.controller-0 (mgr.64108) 15369 : cluster [DBG] pgmap v15323: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:03 controller-0 conmon[353380]: debug 2023-07-13T06:49:03.943+0000 7f8f9f84e700  0 mon.controller-0@2(peon) e2 handle_command mon_command({"prefix": "config dump", "format": "json"} v 0) v1
Jul 13 06:49:03 controller-0 conmon[353380]: debug 2023-07-13T06:49:03.943+0000 7f8f9f84e700  0 log_channel(audit) log [DBG] : from='mgr.64108 [fd00:fd00:fd00:3000::277]:0/4282114632' entity='mgr.controller-0' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
Jul 13 06:49:04 controller-0 conmon[353380]: cluster 2023-07-13T06:49:02.696968+0000 mgr.controller-0 (mgr.64108) 15370 : cluster [DBG] pgmap v15324: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:04 controller-0 conmon[353380]: audit 2023-07-13T06:49:03.945045+0000 mon.controller-0 (mon.2) 7472 : audit [DBG] from='mgr.64108 [fd00:fd00:fd00:3000::277]:0/4282114632' entity='mgr.controller-0' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
Jul 13 06:49:04 controller-0 conmon[353380]: debug 2023-07-13T06:49:04.811+0000 7f8f9f84e700  0 mon.controller-0@2(peon) e2 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/osd_remove_queue}] v 0) v1
Jul 13 06:49:05 controller-0 conmon[353380]: cluster 2023-07-13T06:49:04.697700+0000 mgr.controller-0 (mgr.64108) 15371 : cluster [DBG] pgmap v15325: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:05 controller-0 conmon[353380]: audit 2023-07-13T06:49:04.822347+0000 mon.controller-2 (mon.0) 10302 : audit [INF] from='mgr.64108 ' entity='mgr.controller-0' 
Jul 13 06:49:06 controller-0 conmon[353380]: debug 2023-07-13T06:49:06.422+0000 7f8fa2053700  1 mon.controller-0@2(peon).osd e226 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 71303168 kv_alloc: 872415232
Jul 13 06:49:07 controller-0 conmon[353380]: cluster 2023-07-13T06:49:06.698819+0000 mgr.controller-0 (mgr.64108) 15372 : cluster [DBG] pgmap v15326: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail

● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5
   Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-07-12 22:24:37 UTC; 8h ago
 Main PID: 373192 (conmon)
    Tasks: 4 (limit: 100744)
   Memory: 7.9M
   CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service
           ├─container
           │ ├─373205 /dev/init -- /usr/bin/ceph-crash -n client.crash.controller-0
           │ └─373237 /usr/libexec/platform-python -s /usr/bin/ceph-crash -n client.crash.controller-0
           └─supervisor
             └─373192 /usr/bin/conmon --api-version 1 -c 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 -u 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156>

Jul 12 22:24:36 controller-0 systemd[1]: Starting Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5...
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.193182476 +0000 UTC m=+0.128684558 container create 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, >
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.137771526 +0000 UTC m=+0.073273609 image pull  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.382231705 +0000 UTC m=+0.317733796 container init 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, na>
Jul 12 22:24:37 controller-0 podman[373122]: 2023-07-12 22:24:37.408185636 +0000 UTC m=+0.343687734 container start 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e, n>
Jul 12 22:24:37 controller-0 bash[372881]: 985cc3701391e9501ce0605ac9e0e3aea068cbfae03d398ac7b3f9b42156f725
Jul 12 22:24:37 controller-0 systemd[1]: Started Ceph crash.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5.
Jul 12 22:24:37 controller-0 conmon[373192]: INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s

● ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service - Ceph mgr.controller-0 for 1606aa1c-08f8-4e53-9b34-3c74181f65d5
   Loaded: loaded (/etc/systemd/system/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-07-12 22:20:34 UTC; 8h ago
 Main PID: 356559 (conmon)
    Tasks: 113 (limit: 100744)
   Memory: 465.5M
   CGroup: /system.slice/system-ceph\x2d1606aa1c\x2d08f8\x2d4e53\x2d9b34\x2d3c74181f65d5.slice/ceph-1606aa1c-08f8-4e53-9b34-3c74181f65d5.service
           ├─container
           │ ├─356572 /dev/init -- /usr/bin/ceph-mgr -n mgr.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug 
           │ ├─356584 /usr/bin/ceph-mgr -n mgr.controller-0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug 
           │ ├─356901 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.30 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356902 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.32 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356903 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.13 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356904 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.35 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ ├─356906 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.36 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           │ └─356907 ssh -C -F /tmp/cephadm-conf-etmyg6ii -i /tmp/cephadm-identity-86slzpv0 -o ServerAliveInterval=7 -o ServerAliveCountMax=3 ceph-admin.24.16 sudo python3 -c "import sys;exec(eval(sys.stdin.readline()))"
           └─supervisor
             └─356559 /usr/bin/conmon --api-version 1 -c de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f4117 -u de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f4117 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/de6b573beadcf2396d6e085b316d2f5c79f41365a4fc05499f762b545a8f>

Jul 13 06:49:00 controller-0 conmon[356559]: debug 2023-07-13T06:49:00.695+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15323: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:01 controller-0 conmon[356559]: debug 2023-07-13T06:49:01.837+0000 7f8f98c4a700  0 [progress INFO root] Processing OSDMap change 226..226
Jul 13 06:49:02 controller-0 conmon[356559]: debug 2023-07-13T06:49:02.695+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15324: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.696+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15325: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.824+0000 7f8f9ccd2700  0 [progress WARNING root] complete: ev fa1ba7ce-42cf-4a59-b20b-cd803b331e20 does not exist
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.825+0000 7f8f9ccd2700  0 [progress WARNING root] complete: ev 4df6b9cd-ea4b-45e7-8504-0f3efa3534e6 does not exist
Jul 13 06:49:04 controller-0 conmon[356559]: debug 2023-07-13T06:49:04.826+0000 7f8f9ccd2700  0 [progress WARNING root] complete: ev 82594c29-a3e5-4169-878f-5e82ddbb427f does not exist
Jul 13 06:49:06 controller-0 conmon[356559]: debug 2023-07-13T06:49:06.697+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15326: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail
Jul 13 06:49:06 controller-0 conmon[356559]: debug 2023-07-13T06:49:06.841+0000 7f8f98c4a700  0 [progress INFO root] Processing OSDMap change 226..226
Jul 13 06:49:08 controller-0 conmon[356559]: debug 2023-07-13T06:49:08.698+0000 7f8fa5ee4700  0 log_channel(cluster) log [DBG] : pgmap v15327: 513 pgs: 513 active+clean; 1.8 GiB data, 5.3 GiB used, 475 GiB / 480 GiB avail

Comment 3 Juan Badia Payno 2023-07-13 08:25:11 UTC
It seems that there is a workaround:
 @controller-0 $ sudo cephadm shell -- ceph mgr fail controller-0

controller-0 is due to that it is the active mgr showing on the ceph status

BTW, moving this to DFG:Storage

Comment 4 Juan Badia Payno 2023-07-17 07:54:11 UTC
Possile workaround https://review.opendev.org/c/openstack/tripleo-ansible/+/887565

Comment 8 Manoj Katari 2023-07-20 10:21:24 UTC
Tested the fix in regular ceph deployments and no regression observed. 
We only need to verify in different upgrade scenarios

Comment 9 Ollie Walsh 2023-07-26 09:57:29 UTC
FWIW I did not encounter this issue in OSPdO initially. When I added DeployedCeph: true I hit this issue. Removing it again resolved the issue.

Discussing on slack with fultonj yesterday "you should be setting DeployedCeph: false... It means you used tripleo client to deploy ceph"

Comment 10 John Fulton 2023-07-26 12:09:40 UTC
(In reply to Ollie Walsh from comment #9)
> FWIW I did not encounter this issue in OSPdO initially. When I added
> DeployedCeph: true I hit this issue. Removing it again resolved the issue.
> 
> Discussing on slack with fultonj yesterday "you should be setting
> DeployedCeph: false... It means you used tripleo client to deploy ceph"

But OSPdO is a special case which is using Heat to trigger cephadm because it's not using the python tripleo client.

Comment 14 Manoj Katari 2023-08-07 05:35:09 UTC
Doc text looks good to me.

Comment 27 errata-xmlrpc 2023-09-20 00:29:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:5138


Note You need to log in before you can comment on or make changes to this bug.