When using "deployed ceph" [1] to deploy the overcloud the ceph cluster is configured with "set-require-min-compat-client mimic". Glance images created when the ceph cluster is in this state resulted `glance image-delete` timing out with a 504. When directly making the same call Glance glance makes [2] via the RBD CLI, the following hangs: time rbd -n client.openstack -k /etc/ceph/ceph.client.openstack.keyring --conf /etc/ceph/ceph.conf snap unprotect images/d7e638c0-3030-4ac0-a9a9-e9bd340e993c@snap This seems connected to what is described in the following: https://bugs.launchpad.net/tripleo/+bug/1951433 https://bugzilla.redhat.com/show_bug.cgi?id=2032457 We should ensure deployed ceph does not force "set-require-min-compat-client mimic". Redeploying the same cluster without this setting in the same environment resulted in being able to delete glance images. [1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html [2] https://github.com/openstack/glance_store/blob/master/glance_store/_drivers/rbd.py#L456
I wonder whether to address also the secondary issue here with the glance service iself. When event described above happens, the service is still seen as on: [root@controller-0 log]# systemctl status tripleo_glance_api ● tripleo_glance_api.service - glance_api container Loaded: loaded (/etc/systemd/system/tripleo_glance_api.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2021-12-15 11:06:16 UTC; 1h 25min ago Main PID: 108642 (conmon) Tasks: 0 (limit: 203963) Memory: 0B CGroup: /system.slice/tripleo_glance_api.service ‣ 108642 /usr/bin/conmon --api-version 1 -c d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -u d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -r /usr/bin/runc -b /var/lib/containers/storage/ov>Dec 15 11:06:16 controller-0 systemd[1]: Starting glance_api container... Dec 15 11:06:16 controller-0 systemd[1]: Started glance_api container. Though it really broke down catastrophically and stopped logging, any request from Tempest's side now gets 503 (assuming from httpd one step before?). Container seems to be running though. Only "systemctl restart tripleo_glance_api" recovers the original "usable" state. I'd expect the service goes down and the container crashes in this case. Maybe we need to address this as another BZ against Glance?
(In reply to Filip Hubík from comment #2) > I wonder whether to address also the secondary issue here with the glance > service iself. When event described above happens, the service is still seen > as on: > [root@controller-0 log]# systemctl status tripleo_glance_api > ● tripleo_glance_api.service - glance_api container > Loaded: loaded (/etc/systemd/system/tripleo_glance_api.service; enabled; > vendor preset: disabled) > Active: active (running) since Wed 2021-12-15 11:06:16 UTC; 1h 25min ago > Main PID: 108642 (conmon) > Tasks: 0 (limit: 203963) > Memory: 0B > CGroup: /system.slice/tripleo_glance_api.service > ‣ 108642 /usr/bin/conmon --api-version 1 -c > d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -u > d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -r > /usr/bin/runc -b /var/lib/containers/storage/ov>Dec 15 11:06:16 controller-0 > systemd[1]: Starting glance_api container... > Dec 15 11:06:16 controller-0 systemd[1]: Started glance_api container. > > Though it really broke down catastrophically and stopped logging, any > request from Tempest's side now gets 503 (assuming from httpd one step > before?). Container seems to be running though. Only "systemctl restart > tripleo_glance_api" recovers the original "usable" state. I'd expect the > service > goes down and the container crashes in this case. > > Maybe we need to address this as another BZ against Glance? We already have that bug here https://bugzilla.redhat.com/show_bug.cgi?id=2032457
*** Bug 2036868 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543