Bug 2188477 - DCN: GlanceApiEdge fails to deploy on DistributedComputeHCI nodes
Summary: DCN: GlanceApiEdge fails to deploy on DistributedComputeHCI nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 17.1
Assignee: Alan Bishop
QA Contact: msava
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-20 21:19 UTC by Marian Krcmarik
Modified: 2023-08-16 01:15 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-heat-templates-14.3.1-1.20230519151004.f602c2b.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:14:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 882570 0 None MERGED Fix glance-api deployment in a DCN environment 2023-06-27 12:40:47 UTC
Red Hat Issue Tracker OSP-24430 0 None None None 2023-04-20 21:21:44 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:15:18 UTC

Description Marian Krcmarik 2023-04-20 21:19:28 UTC
Description of problem:
The glance-api fails to be deployed in DCN deployment on the DCN site specifically on DistributedComputeHCI nodes.

The actual error message is:
2023-04-19 19:57:45.936127 |                                      |    WARNING | ERROR: Can't run container glance_api_internal

stderr: Error: statfs /var/lib/kolla/config_files/glance_api.json: no such file or directory
2023-04-19 19:57:45.942951 |                                      |    WARNING | ERROR: Can't run container glance_api_internal_tls_proxy

stderr: Error: statfs /var/lib/kolla/config_files/glance_api_tls_proxy.json: no such file or directory
2023-04-19 19:57:45.948045 | 52540056-ef02-9140-41c7-00000000c797 |      FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | dcn1-computehci-1 | error={"changed": false, "msg": "Failed containers: glance_api_internal, glance_api_internal_tls_proxy"}
2023-04-19 19:57:45.964199 | 52540056-ef02-9140-41c7-00000000c797 |     TIMING | tripleo_container_manage : Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | dcn1-computehci-1 | 0:42:21.014482 | 13.12s
2023-04-19 19:57:46.003735 |                                      |    WARNING | ERROR: Can't run container glance_api_internal

stderr: Error: statfs /var/lib/kolla/config_files/glance_api.json: no such file or directory
2023-04-19 19:57:46.005840 |                                      |    WARNING | ERROR: Can't run container glance_api_internal_tls_proxy

stderr: Error: statfs /var/lib/kolla/config_files/glance_api_tls_proxy.json: no such file or directory
2023-04-19 19:57:46.007860 | 52540056-ef02-9140-41c7-00000000c700 |      FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | dcn1-computehci-0 | error={"changed": false, "msg": "Failed containers: glance_api_internal, glance_api_internal_tls_proxy"}

The problem seems to be missing files specified as the container volume mount point are missing, specifically:
/var/lib/kolla/config_files/glance_api_tls_proxy.json
/var/lib/kolla/config_files/glance_api.json

If I log in on the node I can see following:
[tripleo-admin@dcn1-computehci-0 ~]$ ls /var/lib/kolla/config_files/ | grep glance
glance_api_internal.json
glance_api_internal_tls_proxy.json

But the volume points look like:
[root@dcn1-computehci-0 tripleo-admin]# cat /var/lib/tripleo-config/container-startup-config/step_4/glance_api_internal_tls_proxy.json 
{
  "environment": {
    "KOLLA_CONFIG_STRATEGY": "COPY_ALWAYS",
    "TRIPLEO_CONFIG_HASH": "5ea6953245003848ac83fd667e3a957c"
  },
  "image": "site-undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-glance-api:17.1_20230404.1",
  "net": "host",
  "restart": "always",
  "start_order": 3,
  "user": "root",
  "volumes": [
    "/etc/hosts:/etc/hosts:ro",
    "/etc/localtime:/etc/localtime:ro",
    "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro",
    "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro",
    "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro",
    "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro",
    "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro",
    "/dev/log:/dev/log",
    "/etc/ipa/ca.crt:/etc/ipa/ca.crt:ro",
    "/etc/puppet:/etc/puppet:ro",
    "/var/log/containers/glance:/var/log/glance:z",
    "/var/log/containers/httpd/glance:/var/log/httpd:z",
    "/var/lib/kolla/config_files/glance_api_tls_proxy.json:/var/lib/kolla/config_files/config.json:ro",
    "/var/lib/config-data/puppet-generated/glance_api_internal:/var/lib/kolla/config_files/src:ro",
    "/etc/pki/tls/certs/httpd:/etc/pki/tls/certs/httpd:ro",
    "/etc/pki/tls/private/httpd:/etc/pki/tls/private/httpd:ro"
  ]
}

When the container is started It looks for /var/lib/kolla/config_files/glance_api_tls_proxy.json and not /var/lib/kolla/config_files/glance_api_internal_tls_proxy.json which was generated.

It would work on Controller nodes because the THT role for Controller node includes     - OS::TripleO::Services::GlanceApi which will cause /var/lib/kolla/config_files/glance_api_tls_proxy.json to be generated. The DistributedComputeHCI role has only GlanceApiEdge service included.

To workaround the problem I manually did a following change in THT:
diff --git a/deployment/glance/glance-api-internal-container-puppet.yaml b/deployment/glance/glance-api-internal-container-puppet.yaml
--- a/deployment/glance/glance-api-internal-container-puppet.yaml	(revision 1393d39be367db3acb02508e0e858395a4e4fefa)
+++ b/deployment/glance/glance-api-internal-container-puppet.yaml	(date 1682024903117)
@@ -152,7 +152,7 @@
                   - get_attr: [GlanceApi, role_data, docker_config, step_4, glance_api]
                   - volumes:
                       yaql:
-                        expression: $.data.vols.select($.replace('puppet-generated/glance_api', 'puppet-generated/glance_api_internal'))
+                        expression: $.data.vols.select($.replace('glance_api', 'glance_api_internal'))
                         data:
                           vols: {get_attr: [GlanceApi, role_data, docker_config, step_4, glance_api, volumes]}
               glance_api_internal_tls_proxy:
@@ -162,7 +162,7 @@
                       - get_attr: [GlanceApi, role_data, docker_config, step_4, glance_api_tls_proxy]
                       - volumes:
                           yaql:
-                            expression: $.data.vols.select($.replace('puppet-generated/glance_api', 'puppet-generated/glance_api_internal'))
+                            expression: $.data.vols.select($.replace('glance_api', 'glance_api_internal'))
                             data:
                               vols: {get_attr: [GlanceApi, role_data, docker_config, step_4, glance_api_tls_proxy, volumes]}

But I am not sure what's the right way to fix it. I can provided env If needed and/or I may be wrong about the root cause.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-14.3.1-1.20230402010807.563f2cd.el9ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Deploy a site in DCN env with glance multistore - which means with glance-api deployed on the edge site with DistributedComputeHCI nodes.
2.
3.

Actual results:
The site deployment fails on failed start of glance-api containers

Expected results:
Successful deployment of the DCN site

Additional info:
The Glance internal API was introduced in d/s in 17.1 - upstream release note: https://github.com/openstack/tripleo-heat-templates/blob/master/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml

Comment 2 Alan Bishop 2023-05-04 14:16:04 UTC
I'll take this, though I'm surprised to see this is failing because we tried to do exhaustive downstream tests in this very scenario prior to submitting the patches upstream.

Comment 3 Alan Bishop 2023-05-04 20:51:22 UTC
Marian, your analysis is correct, but I have a slightly different fix in mind. It turns out the kolla json file contents is the same for both the public and internal API services, which means they should both be able to use the same file. Here's my patch:

diff --git a/deployment/glance/glance-api-internal-container-puppet.yaml b/deployment/glance/glance-api-internal-container-puppet.yaml
index 15fab9d14..b6469fce5 100644
--- a/deployment/glance/glance-api-internal-container-puppet.yaml
+++ b/deployment/glance/glance-api-internal-container-puppet.yaml
@@ -133,14 +133,6 @@ outputs:
                   - {get_attr: [MySQLClient, role_data, step_config]}
             config_image: {get_attr: [RoleParametersValue, value, ContainerGlanceApiInternalConfigImage]}
 
-          kolla_config:
-            # The kolla_config are essentially the same as the GlanceApi service.
-            # The only difference is the json file names.
-            /var/lib/kolla/config_files/glance_api_internal.json:
-              {get_attr: [GlanceApi, role_data, kolla_config, /var/lib/kolla/config_files/glance_api.json]}
-            /var/lib/kolla/config_files/glance_api_internal_tls_proxy.json:
-              {get_attr: [GlanceApi, role_data, kolla_config, /var/lib/kolla/config_files/glance_api_tls_proxy.json]}
-
           docker_config:
             step_2:
               get_attr: [GlanceLogging, docker_config, step_2]

The patch works in my own test environment, and *should* also fix it in a DCN deployment. It would be great if you could verify it works for you, in which case I'll submit the patch upstream.

Comment 4 Marian Krcmarik 2023-05-05 16:45:23 UTC
> The patch works in my own test environment, and *should* also fix it in a
> DCN deployment. It would be great if you could verify it works for you, in
> which case I'll submit the patch upstream.

It works in DCN d/s CI as well and It fixes the problem neatly. I think we can submit it upstream, thanks!

Comment 18 errata-xmlrpc 2023-08-16 01:14:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.