Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1947784

Summary: Glane Mutli Store: Missing ceph configuration for a dcn site causes glance_api containers stuck in a restart loop
Product: Red Hat OpenStack Reporter: Sadique Puthen <sputhenp>
Component: openstack-glanceAssignee: Cyril Roelandt <cyril>
Status: CLOSED DUPLICATE QA Contact: Mike Abrams <mabrams>
Severity: medium Docs Contact: Andy Stillman <astillma>
Priority: unspecified    
Version: 16.1 (Train)CC: athomas, cyril, eglynn, gfidente, johfulto
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-26 19:41:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sadique Puthen 2021-04-09 09:11:48 UTC
Description of problem:

I added my second glance store details for DCN2 to my central cluster[1]. During this process I missed to pass the ceph details for the second Ceph cluster normally generated by running below command.

sudo -E openstack overcloud export ceph --stack edge-1,edge-ceph --config-download-dir /var/lib/mistral --output-file /home/stack/templates/osp-16-1/edge-ceph.yaml

The end result is that glance_api containers on central site is stuck in a restart loop which causes glance to be completely down for the central site + all DCN sites already working properly. When I do stack update again with the ceph details for the new DCN site everything is back to normal. Glance is stuck in a restart loop obviously because it cannot get details of the new ceph cluster details, but the impact is a total cluster down.

We should solve this. Either we need to add validation in tripleo to ensure that a multi-store config is rejected if corresponding Ceph details are not provided. Or glance_api should be fixed so that it must remain up and reject image copy or vm provisioing to only the new site to which ceph configuration details are not provided. In any case a total cluster down scenario is not acceptable for a customer. 

The error in the log file as below.

2021-04-09T09:07:09.131937653+00:00 stderr F + echo 'Running command: '\''/usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/glance-image-import.conf'\'''
2021-04-09T09:07:09.131951286+00:00 stdout F Running command: '/usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/glance-image-import.conf'
2021-04-09T09:07:09.131985651+00:00 stderr F + exec /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/glance-image-import.conf
2021-04-09T09:07:11.102336424+00:00 stderr F ERROR: [errno 2] error calling conf_read_file


[1]  https://gitlab.cee.redhat.com/sputhenp/lab/-/blob/master/templates/osp-16-1/glance-update.yaml#L13-18

Comment 2 Cyril Roelandt 2021-04-26 18:26:58 UTC
@Giulio: This sounds like a good idea, but I'm afraid it will lead to configuration issues going unnoticed. Admins will believe that multiple Ceph clusters are used by Glance, when really Glance decided to run with only one store that is actually reachable. How do we make sure this does not backfire?


@Sadique: From which log file do your logs come from?

Comment 3 Cyril Roelandt 2021-04-26 19:41:57 UTC
OK, I see the same bug filed as https://bugzilla.redhat.com/show_bug.cgi?id=1947786 so let's continue the discussion there.


I'm closing this bug as a duplicate of #1947786.

*** This bug has been marked as a duplicate of bug 1947786 ***