2033467 – "deployed ceph" defaults set-require-min-compat-client mimic causing glance problems

Bug 2033467 - "deployed ceph" defaults set-require-min-compat-client mimic causing glance problems

Summary: "deployed ceph" defaults set-require-min-compat-client mimic causing glance p...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	tripleo-ansible
Sub Component:
Version:	17.0 (Wallaby)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ga
Target Release:	17.0
Assignee:	John Fulton
QA Contact:	Yogev Rabl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-16 22:43 UTC by John Fulton
Modified:	2022-09-21 12:18 UTC (History)
CC List:	5 users (show)
Fixed In Version:	tripleo-ansible-3.3.1-0.20220706140824.fa5422f.el9ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-09-21 12:18:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Launchpad	1951433	0	None	None	None	2021-12-16 22:43:57 UTC
OpenStack gerrit	821999	0	None	NEW	Do not set-require-min-compat-client for Ceph by default	2021-12-17 13:59:26 UTC
OpenStack gerrit	822114	0	None	NEW	Do not set-require-min-compat-client for Ceph by default	2021-12-17 18:00:30 UTC
Red Hat Bugzilla	2032457	1	high	CLOSED	Attempting to delete an image from Glance that is in use breaks subsequent usage of that image	2022-09-21 12:37:09 UTC
Red Hat Issue Tracker	OSP-11871	0	None	None	None	2021-12-16 22:45:49 UTC
Red Hat Issue Tracker	RHOSINFRA-4274	0	None	None	None	2021-12-16 22:49:27 UTC
Red Hat Product Errata	RHEA-2022:6543	0	None	None	None	2022-09-21 12:18:36 UTC

Description John Fulton 2021-12-16 22:43:58 UTC

When using "deployed ceph" [1] to deploy the overcloud the ceph cluster is configured with "set-require-min-compat-client mimic". 

Glance images created when the ceph cluster is in this state resulted `glance image-delete` timing out with a 504. When directly making the same call Glance glance makes [2] via the RBD CLI, the following hangs:

time rbd -n client.openstack -k /etc/ceph/ceph.client.openstack.keyring --conf /etc/ceph/ceph.conf snap unprotect images/d7e638c0-3030-4ac0-a9a9-e9bd340e993c@snap

This seems connected to what is described in the following:

 https://bugs.launchpad.net/tripleo/+bug/1951433
 https://bugzilla.redhat.com/show_bug.cgi?id=2032457

We should ensure deployed ceph does not force "set-require-min-compat-client mimic". Redeploying the same cluster without this setting in the same environment resulted in being able to delete glance images.


[1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html

[2] https://github.com/openstack/glance_store/blob/master/glance_store/_drivers/rbd.py#L456

Comment 2 Filip Hubík 2021-12-17 13:44:00 UTC

I wonder whether to address also the secondary issue here with the glance service iself. When event described above happens, the service is still seen as on:
[root@controller-0 log]# systemctl status tripleo_glance_api
● tripleo_glance_api.service - glance_api container
   Loaded: loaded (/etc/systemd/system/tripleo_glance_api.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-12-15 11:06:16 UTC; 1h 25min ago
 Main PID: 108642 (conmon)
    Tasks: 0 (limit: 203963)
   Memory: 0B
   CGroup: /system.slice/tripleo_glance_api.service
           ‣ 108642 /usr/bin/conmon --api-version 1 -c d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -u d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -r /usr/bin/runc -b /var/lib/containers/storage/ov>Dec 15 11:06:16 controller-0 systemd[1]: Starting glance_api container...
Dec 15 11:06:16 controller-0 systemd[1]: Started glance_api container.

Though it really broke down catastrophically and stopped logging, any request from Tempest's side now gets 503 (assuming from httpd one step before?). Container seems to be running though. Only "systemctl restart tripleo_glance_api" recovers the original "usable" state. I'd expect the service 
goes down and the container crashes in this case.

Maybe we need to address this as another BZ against Glance?

Comment 3 John Fulton 2021-12-17 13:51:57 UTC

(In reply to Filip Hubík from comment #2)
> I wonder whether to address also the secondary issue here with the glance
> service iself. When event described above happens, the service is still seen
> as on:
> [root@controller-0 log]# systemctl status tripleo_glance_api
> ● tripleo_glance_api.service - glance_api container
>    Loaded: loaded (/etc/systemd/system/tripleo_glance_api.service; enabled;
> vendor preset: disabled)
>    Active: active (running) since Wed 2021-12-15 11:06:16 UTC; 1h 25min ago
>  Main PID: 108642 (conmon)
>     Tasks: 0 (limit: 203963)
>    Memory: 0B
>    CGroup: /system.slice/tripleo_glance_api.service
>            ‣ 108642 /usr/bin/conmon --api-version 1 -c
> d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -u
> d034cadf5b93055ecefe6a23f3eb647fac6e762787838f41cd2b36218bf51b35 -r
> /usr/bin/runc -b /var/lib/containers/storage/ov>Dec 15 11:06:16 controller-0
> systemd[1]: Starting glance_api container...
> Dec 15 11:06:16 controller-0 systemd[1]: Started glance_api container.
> 
> Though it really broke down catastrophically and stopped logging, any
> request from Tempest's side now gets 503 (assuming from httpd one step
> before?). Container seems to be running though. Only "systemctl restart
> tripleo_glance_api" recovers the original "usable" state. I'd expect the
> service 
> goes down and the container crashes in this case.
> 
> Maybe we need to address this as another BZ against Glance?

We already have that bug here https://bugzilla.redhat.com/show_bug.cgi?id=2032457

Comment 7 John Fulton 2022-01-04 12:12:58 UTC

*** Bug 2036868 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2022-09-21 12:18:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543

Note You need to log in before you can comment on or make changes to this bug.