Bug 2033467 - "deployed ceph" defaults set-require-min-compat-client mimic causing glance problems
Summary: "deployed ceph" defaults set-require-min-compat-client mimic causing glance p...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: 17.0
Assignee: John Fulton
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-16 22:43 UTC by John Fulton
Modified: 2022-09-21 12:18 UTC (History)
5 users (show)

Fixed In Version: tripleo-ansible-3.3.1-0.20220706140824.fa5422f.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:18:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1951433 0 None None None 2021-12-16 22:43:57 UTC
OpenStack gerrit 821999 0 None NEW Do not set-require-min-compat-client for Ceph by default 2021-12-17 13:59:26 UTC
OpenStack gerrit 822114 0 None NEW Do not set-require-min-compat-client for Ceph by default 2021-12-17 18:00:30 UTC
Red Hat Bugzilla 2032457 1 high CLOSED Attempting to delete an image from Glance that is in use breaks subsequent usage of that image 2022-09-21 12:37:09 UTC
Red Hat Issue Tracker OSP-11871 0 None None None 2021-12-16 22:45:49 UTC
Red Hat Issue Tracker RHOSINFRA-4274 0 None None None 2021-12-16 22:49:27 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:18:36 UTC

Description John Fulton 2021-12-16 22:43:58 UTC
When using "deployed ceph" [1] to deploy the overcloud the ceph cluster is configured with "set-require-min-compat-client mimic". 

Glance images created when the ceph cluster is in this state resulted `glance image-delete` timing out with a 504. When directly making the same call Glance glance makes [2] via the RBD CLI, the following hangs:

time rbd -n client.openstack -k /etc/ceph/ceph.client.openstack.keyring --conf /etc/ceph/ceph.conf snap unprotect images/d7e638c0-3030-4ac0-a9a9-e9bd340e993c@snap

This seems connected to what is described in the following:

 https://bugs.launchpad.net/tripleo/+bug/1951433
 https://bugzilla.redhat.com/show_bug.cgi?id=2032457

We should ensure deployed ceph does not force "set-require-min-compat-client mimic". Redeploying the same cluster without this setting in the same environment resulted in being able to delete glance images.


[1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html

[2] https://github.com/openstack/glance_store/blob/master/glance_store/_drivers/rbd.py#L456

Comment 2 Filip Hubík 2021-12-17 13:44:00 UTC
I wonder whether to address also the secondary issue here with the glance service iself. When event described above happens, the service is still seen as on:
[root@controller-0 log]# systemctl status tripleo_glance_api
● tripleo_glance_api.service - glance_api container
   Loaded: loaded (/etc/systemd/system/tripleo_glance_api.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-12-15 11:06:16 UTC; 1h 25min ago
 Main PID: 108642 (conmon)
    Tasks: 0 (limit: 203963)
   Memory: 0B
   CGroup: /system.slice/tripleo_glance_api.service
           ‣ 108642 /usr/bin/conmon --api-version 1 -c d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -u d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -r /usr/bin/runc -b /var/lib/containers/storage/ov>Dec 15 11:06:16 controller-0 systemd[1]: Starting glance_api container...
Dec 15 11:06:16 controller-0 systemd[1]: Started glance_api container.

Though it really broke down catastrophically and stopped logging, any request from Tempest's side now gets 503 (assuming from httpd one step before?). Container seems to be running though. Only "systemctl restart tripleo_glance_api" recovers the original "usable" state. I'd expect the service 
goes down and the container crashes in this case.

Maybe we need to address this as another BZ against Glance?

Comment 3 John Fulton 2021-12-17 13:51:57 UTC
(In reply to Filip Hubík from comment #2)
> I wonder whether to address also the secondary issue here with the glance
> service iself. When event described above happens, the service is still seen
> as on:
> [root@controller-0 log]# systemctl status tripleo_glance_api
> ● tripleo_glance_api.service - glance_api container
>    Loaded: loaded (/etc/systemd/system/tripleo_glance_api.service; enabled;
> vendor preset: disabled)
>    Active: active (running) since Wed 2021-12-15 11:06:16 UTC; 1h 25min ago
>  Main PID: 108642 (conmon)
>     Tasks: 0 (limit: 203963)
>    Memory: 0B
>    CGroup: /system.slice/tripleo_glance_api.service
>            ‣ 108642 /usr/bin/conmon --api-version 1 -c
> d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -u
> d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -r
> /usr/bin/runc -b /var/lib/containers/storage/ov>Dec 15 11:06:16 controller-0
> systemd[1]: Starting glance_api container...
> Dec 15 11:06:16 controller-0 systemd[1]: Started glance_api container.
> 
> Though it really broke down catastrophically and stopped logging, any
> request from Tempest's side now gets 503 (assuming from httpd one step
> before?). Container seems to be running though. Only "systemctl restart
> tripleo_glance_api" recovers the original "usable" state. I'd expect the
> service 
> goes down and the container crashes in this case.
> 
> Maybe we need to address this as another BZ against Glance?

We already have that bug here https://bugzilla.redhat.com/show_bug.cgi?id=2032457

Comment 7 John Fulton 2022-01-04 12:12:58 UTC
*** Bug 2036868 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2022-09-21 12:18:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.