Bug 1947786
| Summary: | Glane Mutli Store: Missing ceph configuration for a dcn site causes glance_api containers stuck in a restart loop | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Sadique Puthen <sputhenp> | ||||||
| Component: | openstack-glance | Assignee: | Cyril Roelandt <cyril> | ||||||
| Status: | NEW --- | QA Contact: | |||||||
| Severity: | medium | Docs Contact: | Andy Stillman <astillma> | ||||||
| Priority: | unspecified | ||||||||
| Version: | 16.1 (Train) | CC: | athomas, eglynn, ekuvaja, gfidente, johfulto, pdeore, udesale | ||||||
| Target Milestone: | --- | Flags: | cyril:
needinfo?
(pdeore) |
||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | Bug | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Sadique Puthen
2021-04-09 09:15:09 UTC
Created attachment 1770521 [details]
glance-api.log from /var/log/containers/stdout directory
Created attachment 1770532 [details]
/var/log/containers/glance/api.log
Can you confirm the version of python-glance-store and the version of the glance container on your system? Looks similar to 1832667. # podman exec -it glance_api rpm -qa | grep glance-store python3-glance-store-1.0.2-1.20201114020939.bc62bb4.el8ost.noarch So he has a version newer than the fixing version of 1832667 (python-glance-store-1.0.2-0.20200511193428.a622766.el8ost) Also, here's the container tag he used. # podman images | grep glance satellite.redhat.local:5000/sadique_openstack-openstack-16-1-osp_16_1_containers-osp16_containers-glance-api 16.1 3281c69acf43 2 weeks ago 972 MB There is no reason why glance-api wouldn't be coming up. Even the logs shows that the service comes up and just reports that one of the stores is misconfigured and will be tried to access read-only. Do we have pacemaker logs available, why the service gets killed after a while (we can see from the logs that it happily answers healthcheck polls a while before getting killed and restarted)? I was under the impression that pacemaker is not used for making glance_api HA. Not on the DCN edge sites, but it's in control of the central site services. pcs status on central does not have glance_api under its control. It is a stand alone container.
# pcs status
Cluster name: tripleo_cluster
Cluster Summary:
* Stack: corosync
* Current DC: controller-1 (version 2.0.3-5.el8_2.3-4b1f869f0f) - partition with quorum
* Last updated: Fri Apr 9 14:18:39 2021
* Last change: Fri Apr 9 06:58:41 2021 by root via cibadmin on controller-1
* 15 nodes configured
* 47 resource instances configured
Node List:
* Online: [ controller-1 controller-2 controller-3 ]
* GuestOnline: [ galera-bundle-0@controller-1 galera-bundle-1@controller-2 galera-bundle-2@controller-3 ovn-dbs-bundle-0@controller-1 ovn-dbs-bundle-1@controller-2 ovn-dbs-bundle-2@controller-3 rabbitmq-bundle-0@controller-1 rabbitmq-bundle-1@controller-2 rabbitmq-bundle-2@controller-3 redis-bundle-0@controller-1 redis-bundle-1@controller-2 redis-bundle-2@controller-3 ]
Full List of Resources:
* Container bundle set: galera-bundle [cluster.common.tag/sadique_openstack-openstack-16-1-osp_16_1_containers-osp16_containers-mariadb:pcmklatest]:
* galera-bundle-0 (ocf::heartbeat:galera): Master controller-1
* galera-bundle-1 (ocf::heartbeat:galera): Master controller-2
* galera-bundle-2 (ocf::heartbeat:galera): Master controller-3
* Container bundle set: rabbitmq-bundle [cluster.common.tag/sadique_openstack-openstack-16-1-osp_16_1_containers-osp16_containers-rabbitmq:pcmklatest]:
* rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-1
* rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-2
* rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-3
* Container bundle set: redis-bundle [cluster.common.tag/sadique_openstack-openstack-16-1-osp_16_1_containers-osp16_containers-redis:pcmklatest]:
* redis-bundle-0 (ocf::heartbeat:redis): Master controller-1
* redis-bundle-1 (ocf::heartbeat:redis): Slave controller-2
* redis-bundle-2 (ocf::heartbeat:redis): Slave controller-3
* ip-172.16.0.150 (ocf::heartbeat:IPaddr2): Started controller-1
* ip-172.16.200.150 (ocf::heartbeat:IPaddr2): Started controller-2
* ip-172.20.0.151 (ocf::heartbeat:IPaddr2): Started controller-3
* ip-172.20.0.150 (ocf::heartbeat:IPaddr2): Started controller-1
* ip-172.18.0.150 (ocf::heartbeat:IPaddr2): Started controller-2
* ip-172.19.0.150 (ocf::heartbeat:IPaddr2): Started controller-3
* Container bundle set: haproxy-bundle [cluster.common.tag/sadique_openstack-openstack-16-1-osp_16_1_containers-osp16_containers-haproxy:pcmklatest]:
* haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started controller-1
* haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started controller-2
* haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started controller-3
* Container bundle set: ovn-dbs-bundle [cluster.common.tag/sadique_openstack-openstack-16-1-osp_16_1_containers-osp16_containers-ovn-northd:pcmklatest]:
* ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-1
* ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-2
* ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-3
* ip-172.20.0.110 (ocf::heartbeat:IPaddr2): Started controller-1
* Container bundle: openstack-cinder-volume [cluster.common.tag/sadique_openstack-openstack-16-1-osp_16_1_containers-osp16_containers-cinder-volume:pcmklatest]:
* openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started controller-2
It is expected [1] that glance is not under pcs control in 16. Let's look into why glance is failing as it should be able to handle one misconfigured store. [1] https://specs.openstack.org/openstack/tripleo-specs/specs/newton/pacemaker-next-generation-architecture.html Suggested reproducer: Configure two glance backends with RBD [1] but do not provide the ceph configuration files for one of the backends [2]. [1] """ [central] rbd_store_ceph_conf=/etc/ceph/central.conf rbd_store_user=openstack rbd_store_pool=images store_description=central rbd glance store [edge-1] rbd_store_ceph_conf=/etc/ceph/edge1.conf rbd_store_user=openstack rbd_store_pool=images store_description=edge-1 rbd glance store """ [2] """ [root@controller-2 ~]# ls -l /etc/ceph/ total 8 -rw-------. 1 167 167 201 Apr 15 16:18 central.client.openstack.keyring -rw-r--r--. 1 root root 658 Apr 15 16:18 central.conf [root@controller-2 ~]# """ *** Bug 1947784 has been marked as a duplicate of this bug. *** @Pranali: Do you think ooo could check that the configuration is good enough before starting the services? |