Description of problem: I added my second glance store details for DCN2 to my central cluster[1]. During this process I missed to pass the ceph details for the second Ceph cluster normally generated by running below command. sudo -E openstack overcloud export ceph --stack edge-1,edge-ceph --config-download-dir /var/lib/mistral --output-file /home/stack/templates/osp-16-1/edge-ceph.yaml The end result is that glance_api containers on central site is stuck in a restart loop which causes glance to be completely down for the central site + all DCN sites already working properly. When I do stack update again with the ceph details for the new DCN site everything is back to normal. Glance is stuck in a restart loop obviously because it cannot get details of the new ceph cluster details, but the impact is a total cluster down. We should solve this. Either we need to add validation in tripleo to ensure that a multi-store config is rejected if corresponding Ceph details are not provided. Or glance_api should be fixed so that it must remain up and reject image copy or vm provisioing to only the new site to which ceph configuration details are not provided. In any case a total cluster down scenario is not acceptable for a customer. The error in the log file as below. 2021-04-09T09:07:09.131937653+00:00 stderr F + echo 'Running command: '\''/usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/glance-image-import.conf'\''' 2021-04-09T09:07:09.131951286+00:00 stdout F Running command: '/usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/glance-image-import.conf' 2021-04-09T09:07:09.131985651+00:00 stderr F + exec /usr/bin/glance-api --config-file /usr/share/glance/glance-api-dist.conf --config-file /etc/glance/glance-api.conf --config-file /etc/glance/glance-image-import.conf 2021-04-09T09:07:11.102336424+00:00 stderr F ERROR: [errno 2] error calling conf_read_file [1] https://gitlab.cee.redhat.com/sputhenp/lab/-/blob/master/templates/osp-16-1/glance-update.yaml#L13-18
@Giulio: This sounds like a good idea, but I'm afraid it will lead to configuration issues going unnoticed. Admins will believe that multiple Ceph clusters are used by Glance, when really Glance decided to run with only one store that is actually reachable. How do we make sure this does not backfire? @Sadique: From which log file do your logs come from?
OK, I see the same bug filed as https://bugzilla.redhat.com/show_bug.cgi?id=1947786 so let's continue the discussion there. I'm closing this bug as a duplicate of #1947786. *** This bug has been marked as a duplicate of bug 1947786 ***