Bug 1947784 - Glane Mutli Store: Missing ceph configuration for a dcn site causes glance_api containers stuck in a restart loop
Summary: Glane Mutli Store: Missing ceph configuration for a dcn site causes glance_ap...
Keywords:
Status: CLOSED DUPLICATE of bug 1947786
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-glance
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Cyril Roelandt
QA Contact: Mike Abrams
Andy Stillman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-09 09:11 UTC by Sadique Puthen
Modified: 2022-08-26 15:17 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-26 19:41:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-2112 0 None None None 2022-08-26 15:17:09 UTC

Description Sadique Puthen 2021-04-09 09:11:48 UTC
Description of problem:

I added my second glance store details for DCN2 to my central cluster[1]. During this process I missed to pass the ceph details for the second Ceph cluster normally generated by running below command.

sudo -E openstack overcloud export ceph --stack edge-1,edge-ceph --config-download-dir /var/lib/mistral --output-file /home/stack/templates/osp-16-1/edge-ceph.yaml

The end result is that glance_api containers on central site is stuck in a restart loop which causes glance to be completely down for the central site + all DCN sites already working properly. When I do stack update again with the ceph details for the new DCN site everything is back to normal. Glance is stuck in a restart loop obviously because it cannot get details of the new ceph cluster details, but the impact is a total cluster down.

We should solve this. Either we need to add validation in tripleo to ensure that a multi-store config is rejected if corresponding Ceph details are not provided. Or glance_api should be fixed so that it must remain up and reject image copy or vm provisioing to only the new site to which ceph configuration details are not provided. In any case a total cluster down scenario is not acceptable for a customer. 

The error in the log file as below.

2021-04-09T09:07:09.131937653+00:00 stderr F + echo 'Running command: '\''/usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/glance-image-import.conf'\'''
2021-04-09T09:07:09.131951286+00:00 stdout F Running command: '/usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/glance-image-import.conf'
2021-04-09T09:07:09.131985651+00:00 stderr F + exec /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/glance-image-import.conf
2021-04-09T09:07:11.102336424+00:00 stderr F ERROR: [errno 2] error calling conf_read_file


[1]  https://gitlab.cee.redhat.com/sputhenp/lab/-/blob/master/templates/osp-16-1/glance-update.yaml#L13-18

Comment 2 Cyril Roelandt 2021-04-26 18:26:58 UTC
@Giulio: This sounds like a good idea, but I'm afraid it will lead to configuration issues going unnoticed. Admins will believe that multiple Ceph clusters are used by Glance, when really Glance decided to run with only one store that is actually reachable. How do we make sure this does not backfire?


@Sadique: From which log file do your logs come from?

Comment 3 Cyril Roelandt 2021-04-26 19:41:57 UTC
OK, I see the same bug filed as https://bugzilla.redhat.com/show_bug.cgi?id=1947786 so let's continue the discussion there.


I'm closing this bug as a duplicate of #1947786.

*** This bug has been marked as a duplicate of bug 1947786 ***


Note You need to log in before you can comment on or make changes to this bug.