Description of problem: The glance-api fails to be deployed in DCN deployment on the DCN site specifically on DistributedComputeHCI nodes. The actual error message is: 2023-04-19 19:57:45.936127 | | WARNING | ERROR: Can't run container glance_api_internal stderr: Error: statfs /var/lib/kolla/config_files/glance_api.json: no such file or directory 2023-04-19 19:57:45.942951 | | WARNING | ERROR: Can't run container glance_api_internal_tls_proxy stderr: Error: statfs /var/lib/kolla/config_files/glance_api_tls_proxy.json: no such file or directory 2023-04-19 19:57:45.948045 | 52540056-ef02-9140-41c7-00000000c797 | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | dcn1-computehci-1 | error={"changed": false, "msg": "Failed containers: glance_api_internal, glance_api_internal_tls_proxy"} 2023-04-19 19:57:45.964199 | 52540056-ef02-9140-41c7-00000000c797 | TIMING | tripleo_container_manage : Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | dcn1-computehci-1 | 0:42:21.014482 | 13.12s 2023-04-19 19:57:46.003735 | | WARNING | ERROR: Can't run container glance_api_internal stderr: Error: statfs /var/lib/kolla/config_files/glance_api.json: no such file or directory 2023-04-19 19:57:46.005840 | | WARNING | ERROR: Can't run container glance_api_internal_tls_proxy stderr: Error: statfs /var/lib/kolla/config_files/glance_api_tls_proxy.json: no such file or directory 2023-04-19 19:57:46.007860 | 52540056-ef02-9140-41c7-00000000c700 | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | dcn1-computehci-0 | error={"changed": false, "msg": "Failed containers: glance_api_internal, glance_api_internal_tls_proxy"} The problem seems to be missing files specified as the container volume mount point are missing, specifically: /var/lib/kolla/config_files/glance_api_tls_proxy.json /var/lib/kolla/config_files/glance_api.json If I log in on the node I can see following: [tripleo-admin@dcn1-computehci-0 ~]$ ls /var/lib/kolla/config_files/ | grep glance glance_api_internal.json glance_api_internal_tls_proxy.json But the volume points look like: [root@dcn1-computehci-0 tripleo-admin]# cat /var/lib/tripleo-config/container-startup-config/step_4/glance_api_internal_tls_proxy.json { "environment": { "KOLLA_CONFIG_STRATEGY": "COPY_ALWAYS", "TRIPLEO_CONFIG_HASH": "5ea6953245003848ac83fd667e3a957c" }, "image": "site-undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-glance-api:17.1_20230404.1", "net": "host", "restart": "always", "start_order": 3, "user": "root", "volumes": [ "/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ipa/ca.crt:/etc/ipa/ca.crt:ro", "/etc/puppet:/etc/puppet:ro", "/var/log/containers/glance:/var/log/glance:z", "/var/log/containers/httpd/glance:/var/log/httpd:z", "/var/lib/kolla/config_files/glance_api_tls_proxy.json:/var/lib/kolla/config_files/config.json:ro", "/var/lib/config-data/puppet-generated/glance_api_internal:/var/lib/kolla/config_files/src:ro", "/etc/pki/tls/certs/httpd:/etc/pki/tls/certs/httpd:ro", "/etc/pki/tls/private/httpd:/etc/pki/tls/private/httpd:ro" ] } When the container is started It looks for /var/lib/kolla/config_files/glance_api_tls_proxy.json and not /var/lib/kolla/config_files/glance_api_internal_tls_proxy.json which was generated. It would work on Controller nodes because the THT role for Controller node includes - OS::TripleO::Services::GlanceApi which will cause /var/lib/kolla/config_files/glance_api_tls_proxy.json to be generated. The DistributedComputeHCI role has only GlanceApiEdge service included. To workaround the problem I manually did a following change in THT: diff --git a/deployment/glance/glance-api-internal-container-puppet.yaml b/deployment/glance/glance-api-internal-container-puppet.yaml --- a/deployment/glance/glance-api-internal-container-puppet.yaml (revision 1393d39be367db3acb02508e0e858395a4e4fefa) +++ b/deployment/glance/glance-api-internal-container-puppet.yaml (date 1682024903117) @@ -152,7 +152,7 @@ - get_attr: [GlanceApi, role_data, docker_config, step_4, glance_api] - volumes: yaql: - expression: $.data.vols.select($.replace('puppet-generated/glance_api', 'puppet-generated/glance_api_internal')) + expression: $.data.vols.select($.replace('glance_api', 'glance_api_internal')) data: vols: {get_attr: [GlanceApi, role_data, docker_config, step_4, glance_api, volumes]} glance_api_internal_tls_proxy: @@ -162,7 +162,7 @@ - get_attr: [GlanceApi, role_data, docker_config, step_4, glance_api_tls_proxy] - volumes: yaql: - expression: $.data.vols.select($.replace('puppet-generated/glance_api', 'puppet-generated/glance_api_internal')) + expression: $.data.vols.select($.replace('glance_api', 'glance_api_internal')) data: vols: {get_attr: [GlanceApi, role_data, docker_config, step_4, glance_api_tls_proxy, volumes]} But I am not sure what's the right way to fix it. I can provided env If needed and/or I may be wrong about the root cause. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-14.3.1-1.20230402010807.563f2cd.el9ost.noarch How reproducible: Always Steps to Reproduce: 1. Deploy a site in DCN env with glance multistore - which means with glance-api deployed on the edge site with DistributedComputeHCI nodes. 2. 3. Actual results: The site deployment fails on failed start of glance-api containers Expected results: Successful deployment of the DCN site Additional info: The Glance internal API was introduced in d/s in 17.1 - upstream release note: https://github.com/openstack/tripleo-heat-templates/blob/master/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
I'll take this, though I'm surprised to see this is failing because we tried to do exhaustive downstream tests in this very scenario prior to submitting the patches upstream.
Marian, your analysis is correct, but I have a slightly different fix in mind. It turns out the kolla json file contents is the same for both the public and internal API services, which means they should both be able to use the same file. Here's my patch: diff --git a/deployment/glance/glance-api-internal-container-puppet.yaml b/deployment/glance/glance-api-internal-container-puppet.yaml index 15fab9d14..b6469fce5 100644 --- a/deployment/glance/glance-api-internal-container-puppet.yaml +++ b/deployment/glance/glance-api-internal-container-puppet.yaml @@ -133,14 +133,6 @@ outputs: - {get_attr: [MySQLClient, role_data, step_config]} config_image: {get_attr: [RoleParametersValue, value, ContainerGlanceApiInternalConfigImage]} - kolla_config: - # The kolla_config are essentially the same as the GlanceApi service. - # The only difference is the json file names. - /var/lib/kolla/config_files/glance_api_internal.json: - {get_attr: [GlanceApi, role_data, kolla_config, /var/lib/kolla/config_files/glance_api.json]} - /var/lib/kolla/config_files/glance_api_internal_tls_proxy.json: - {get_attr: [GlanceApi, role_data, kolla_config, /var/lib/kolla/config_files/glance_api_tls_proxy.json]} - docker_config: step_2: get_attr: [GlanceLogging, docker_config, step_2] The patch works in my own test environment, and *should* also fix it in a DCN deployment. It would be great if you could verify it works for you, in which case I'll submit the patch upstream.
> The patch works in my own test environment, and *should* also fix it in a > DCN deployment. It would be great if you could verify it works for you, in > which case I'll submit the patch upstream. It works in DCN d/s CI as well and It fixes the problem neatly. I think we can submit it upstream, thanks!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577