Bug 2321715

Summary: openstack undercloud install on 16.2-latest aborts with keyerror and exception (Late oct 2024)
Product: Red Hat OpenStack Reporter: Vincent S. Cojot <vcojot>
Component: openstack-tripleo-commonAssignee: Rabi Mishra <ramishra>
Status: CLOSED MIGRATED QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: dhughes, kgilliga, mburns, pweeks, ramishra, slinaber, toneata
Target Milestone: asyncKeywords: Regression, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-common-11.7.1-2.20241105125025.e189622.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2322348 2322349 2323788 (view as bug list) Environment:
Last Closed: 2024-12-19 18:26:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2322348, 2322349, 2323788    

Description Vincent S. Cojot 2024-10-25 08:59:50 UTC
Description of problem:

'openstack undercloud install' last worked fine a month ago (Sept 11th 2024).
Now, with the latest rpms, I get a python error.
Config files have not changed in several years.

Sept 11th 2024:
** Handling template files **
jinja2 rendering normal template net-config-bond.j2.yaml
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./net-config-bond.yaml
[..]
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./extraconfig/all_nodes/swap.yaml
jinja2 rendering role template role.role.j2.yaml
jinja2 rendering roles Undercloud
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./extraconfig/nova_metadata/krb-service-principals/undercloud-role.yaml
jinja2 rendering network template network.network.j2.yaml
jinja2 rendering networks External
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./network/external.yaml

I now (2024/10/24) get this:
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./network/ports/external_from_pool_v6.yaml
jinja2 rendering network template port_v6.network.j2.yaml
jinja2 rendering networks External
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./network/ports/external_v6.yaml
jinja2 rendering role template role.role.j2.yaml
jinja2 rendering roles Undercloud
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./puppet/undercloud-role.yaml
Exception: 'layers'
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_deploy.py", line 1297, in _standalone_deploy
    parsed_args)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_deploy.py", line 814, in _deploy_tripleo_heat_templates
    self._prepare_container_images(env, roles_data)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_deploy.py", line 759, in _prepare_container_images
    env, roles_data, dry_run=True)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/kolla_builder.py", line 228, in container_images_prepare_multi
    lock=lock
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/kolla_builder.py", line 357, in container_images_prepare
    images, tag_from_label, default_tag)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 1142, in discover_image_tags
    discover_args):
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 2779, in discover_tag_from_inspect
    i = self._inspect(image_url, session=session, default_tag=default_tag)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 2606, in _inspect
    image_url, session=session, default_tag=default_tag)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 292, in wrapped_f
    return self.call(f, *args, **kw)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 358, in call
    do = self.iter(retry_state=retry_state)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 319, in iter
    return fut.result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 361, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 956, in _inspect
    layers = [x['digest'] for x in manifest['layers']]
KeyError: 'layers'
None
Install artifact is located at /home/stack/undercloud-install-20241025085006.tar.bzip2

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Deployment Failed!

ERROR: Heat log files: /var/log/heat-launcher/undercloud_deploy-395n1egj

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Deployment failed.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Comment 1 Vincent S. Cojot 2024-10-25 09:18:01 UTC
sosreport too big.

$ cat undercloud.conf
[DEFAULT]
image_path = /home/stack/images/
container_images_file = /home/stack/OSP/osp16.2/containers-prepare-parameter.yaml
enabled_hardware_types = ipmi,redfish,idrac
enabled_boot_interfaces = pxe
enabled_deploy_interfaces = iscsi,direct
enabled_management_interfaces = ipmitool
enabled_power_interfaces = ipmitool
discovery_default_driver = ipmi
undercloud_enable_selinux = true
undercloud_debug = false

# Local site
# undercloud_hostname = osp16h.lasthome.solace.krynn
overcloud_domain_name = lasthome.solace.krynn
subnets = ctlplane-subnet
local_subnet = ctlplane-subnet
undercloud_nameservers = 10.0.128.234,10.0.128.236,10.0.128.254
undercloud_ntp_servers = 10.20.0.1,10.0.128.234,10.0.128.236,10.0.128.254
hieradata_override = /home/stack/OSP/osp16.2/undercloud-override.yaml

# Certificates
generate_service_certificate = true
certificate_generation_ca = local
# undercloud_service_certificate = /etc/pki/instack-certs/undercloud.pem

# Network
local_ip = 10.20.0.2/24
# undercloud_public_host = 10.0.129.42
undercloud_public_host = 10.20.0.3
undercloud_admin_host = 10.20.0.4
local_interface = bond0
inspection_interface = br-ctlplane
inspection_iprange = 10.20.0.200,10.20.0.240

# Features
enable_mistral = true
enable_zaqar = true
enable_telemetry = false
enable_tempest = true
inspection_extras = true
enable_validations = true
clean_nodes = false
enable_node_discovery = true
ipxe_enabled = true
# inspection_enable_uefi = true

[ctlplane-subnet]
cidr = 10.20.0.0/24
dhcp_start = 10.20.0.101
dhcp_end = 10.20.0.164
inspection_iprange = 10.20.0.200,10.20.0.240
gateway = 10.20.0.1
masquerade = true

Comment 2 Vincent S. Cojot 2024-10-25 09:18:47 UTC
$ cat /home/stack/OSP/osp16.2/undercloud-override.yaml
# HAProxy timeouts
tripleo::haproxy::ssl_cipher_suite: "!SSLv2:kEECDH:kRSA:kEDH:kPSK:+3DES:!aNULL:!eNULL:!MD5:!EXP:!RC4:!SEED:!IDEA:!DES:!MEDIUM"
tripleo::haproxy::ssl_options: 'no-sslv3 no-tls-tickets'
tripleo::haproxy::haproxy_global_maxconn: 32768
tripleo::haproxy::haproxy_default_maxconn: 8192
# Keystone token expiry (for updates on large clouds):
keystone::token_expiration: 14400
# MySQL
mysql_max_connections: '8192'
#
nova::compute::ironic::max_concurrent_builds: 8
# Ironic cleaning
ironic::conductor::cleaning_disk_erase: metadata
ironic::conductor::cleaning_network: ctlplane

Comment 12 Vincent S. Cojot 2024-11-04 13:35:15 UTC
With that change it no longer aborts:

$ diff -u /home/stack/OSP/osp16.2/containers-prepare-parameter.yaml.orig /home/stack/OSP/osp16.2/containers-prepare-parameter.yaml
--- /home/stack/OSP/osp16.2/containers-prepare-parameter.yaml.orig      2024-10-30 17:40:09.990000000 -0400
+++ /home/stack/OSP/osp16.2/containers-prepare-parameter.yaml   2024-11-04 08:32:55.365000000 -0500
@@ -1,7 +1,6 @@
 parameter_defaults:
   ContainerImagePrepare:
-  - tag_from_label: "{version}-{release}"
-    push_destination: false
+  - push_destination: false
     set:
       ceph_alertmanager_image: krynn_rhosp-osp16_containers-ose-prometheus-alertmanager
       ceph_alertmanager_namespace: sat6.lasthome.solace.krynn
@@ -22,7 +21,7 @@
       name_suffix: ''
       namespace: sat6.lasthome.solace.krynn
       neutron_driver: ovs
-      #tag: 16.2
+      tag: 16.2
       rhel_containers: false
     excludes:
       - ose-prometheus

the problem is many Telco customers use tag_from_label, not tags.

Comment 13 Vincent S. Cojot 2024-11-04 13:36:29 UTC
The satellite:

[root@sat6 ~]# rpm -q satellite
satellite-6.14.4.3-1.el8sat.noarch

Comment 15 Vincent S. Cojot 2024-11-04 15:22:17 UTC
Copied the patch from my Linux box and it applied just fine, darn Windows.

[root@osp16d /usr/lib/python3.6/site-packages]# patch -p1 < ~stack/p3.txt
patching file tripleo_common/image/image_uploader.py

Comment 16 Vincent S. Cojot 2024-11-04 16:42:01 UTC
With the patch in place, I got this:
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./network/ports/external_from_pool_v6.yaml
jinja2 rendering network template port_v6.network.j2.yaml
jinja2 rendering networks External
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./network/ports/external_v6.yaml
jinja2 rendering role template role.role.j2.yaml
jinja2 rendering roles Undercloud
rendering j2 template to file: /home/stack/tripleo-heat-installer-templates/./puppet/undercloud-role.yaml
Exception: 404 Client Error: Not Found for url: https://sat6.lasthome.solace.krynn/v2/krynn_rhosp-osp16_containers-openstack-cron/blobs/sha256:c4ae84d99c69a4001389c3ea2cceb774c987e553fe41b3c95a047277ef5b1c61
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_deploy.py", line 1297, in _standalone_deploy
    parsed_args)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_deploy.py", line 814, in _deploy_tripleo_heat_templates
    self._prepare_container_images(env, roles_data)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_deploy.py", line 759, in _prepare_container_images
    env, roles_data, dry_run=True)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/kolla_builder.py", line 228, in container_images_prepare_multi
    lock=lock
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/kolla_builder.py", line 357, in container_images_prepare
    images, tag_from_label, default_tag)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 1227, in discover_image_tags
    discover_args):
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 2795, in discover_tag_from_inspect
    i = self._inspect(image_url, session=session, default_tag=default_tag)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 2622, in _inspect
    image_url, session=session, default_tag=default_tag)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 292, in wrapped_f
    return self.call(f, *args, **kw)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 358, in call
    do = self.iter(retry_state=retry_state)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 331, in iter
    raise retry_exc.reraise()
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 167, in reraise
    raise self.last_attempt.result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 361, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 1055, in _inspect
    allow_redirects=False
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 442, in get
    **kwargs)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 292, in wrapped_f
    return self.call(f, *args, **kw)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 358, in call
    do = self.iter(retry_state=retry_state)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 319, in iter
    return fut.result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 361, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 421, in _action
    request=req)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 262, in check_status
    request.raise_for_status()
  File "/usr/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://sat6.lasthome.solace.krynn/v2/krynn_rhosp-osp16_containers-openstack-cron/blobs/sha256:c4ae84d99c69a4001389c3ea2cceb774c987e553fe41b3c95a047277ef5b1c61
None
Install artifact is located at /home/stack/undercloud-install-20241104164033.tar.bzip2

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Deployment Failed!

ERROR: Heat log files: /var/log/heat-launcher/undercloud_deploy-xuwhq2mx

Comment 17 Vincent S. Cojot 2024-11-04 17:01:17 UTC
This file used to work:



parameter_defaults:
  ContainerImagePrepare:
  - tag_from_label: "{version}-{release}"
    push_destination: false
    set:
      ceph_alertmanager_image: krynn_rhosp-osp16_containers-ose-prometheus-alertmanager
      ceph_alertmanager_namespace: sat6.lasthome.solace.krynn
      ceph_alertmanager_tag: 4.1
      ceph_grafana_image: krynn_rhosp-osp16_containers-ose-grafana
      ceph_grafana_namespace: sat6.lasthome.solace.krynn
      ceph_grafana_tag: 4.1
      ceph_node_exporter_image: krynn_rhosp-osp16_containers-ose-prometheus-node-exporter
      ceph_node_exporter_namespace: sat6.lasthome.solace.krynn
      ceph_node_exporter_tag: v4.1
      ceph_prometheus_image: krynn_rhosp-osp16_containers-ose-prometheus
      ceph_prometheus_namespace: sat6.lasthome.solace.krynn
      ceph_prometheus_tag: 4.1
      ceph_image: krynn_rhosp-osp16_containers-rhceph-4-rhel8
      ceph_namespace: sat6.lasthome.solace.krynn
      ceph_tag: latest
      name_prefix: krynn_rhosp-osp16_containers-openstack-
      name_suffix: ''
      namespace: sat6.lasthome.solace.krynn
      neutron_driver: ovs
      #tag: 16.2
      rhel_containers: false
    excludes:
      - ose-prometheus
      - ose-prometheus-alertmanager
      - ose-prometheus-node-exporter
  ContainerImageRegistryCredentials:
    registry.redhat.io:
      6340056|vcojot-rhosp: eyJXXXXXXXXXXXXXXXXXXXXXX
  ContainerImageRegistryLogin: true

I can pull the images by tag with or without auth:

[stack@osp16d ~]$ podman  login  sat6.lasthome.solace.krynn
Authenticating with existing credentials...
Existing credentials are valid. Already logged in to sat6.lasthome.solace.krynn

[stack@osp16d ~]$ podman pull sat6.lasthome.solace.krynn/krynn_rhosp-osp16_containers-openstack-cron:16.2.6
Trying to pull sat6.lasthome.solace.krynn/krynn_rhosp-osp16_containers-openstack-cron:16.2.6...
Getting image source signatures
Copying blob bbace1e08f3e skipped: already exists
Copying blob e0d0fb6418b1 skipped: already exists
Copying blob e627f7a5737e [--------------------------------------] 0.0b / 0.0b
Copying config 7ed88ede9c done
Writing manifest to image destination
Storing signatures
7ed88ede9c7ed549c930b70bba242217385984ca369e0badbe4edffa5588ff2b

I can list tags:
[root@osp16d ~]#  skopeo list-tags docker://sat6.lasthome.solace.krynn/krynn_rhosp-osp16_containers-openstack-cron|grep '16.2'|wc -l
47
[root@osp16d ~]#  skopeo list-tags docker://sat6.lasthome.solace.krynn/krynn_rhosp-osp16_containers-openstack-cron|grep 'sha256'|wc -l
53
[root@osp16d ~]#

I can pull (unauth) by tag:
[root@osp16d ~]# podman logout sat6.lasthome.solace.krynn
Error: Not logged into sat6.lasthome.solace.krynn

[root@osp16d ~]# podman pull docker://sat6.lasthome.solace.krynn/krynn_rhosp-osp16_containers-openstack-cron@sha256:10ab52e021b436d222cdba881825e6e1b3d7f11b96ff169ad36cb3ad412102d2
Trying to pull sat6.lasthome.solace.krynn/krynn_rhosp-osp16_containers-openstack-cron@sha256:10ab52e021b436d222cdba881825e6e1b3d7f11b96ff169ad36cb3ad412102d2...
Getting image source signatures
Copying blob b3b93d72e803 skipped: already exists
Copying blob c3b2eb007e57 [--------------------------------------] 0.0b / 0.0b
Copying blob 90d946b542da [--------------------------------------] 0.0b / 0.0b
Copying config 7a79f2246a done
Writing manifest to image destination
Storing signatures
7a79f2246a1686eafec699ba5ba34ca5ac78adad154623d80ed27321cac36bc2


I can even pull the digest that tripleo was trying to use:
[root@osp16d ~]# podman pull docker://sat6.lasthome.solace.krynn/krynn_rhosp-osp16_containers-openstack-cron@sha256:c4ae84d99c69a4001389c3ea2cceb774c987e553fe41b3c95a047277ef5b1c61
Trying to pull sat6.lasthome.solace.krynn/krynn_rhosp-osp16_containers-openstack-cron@sha256:c4ae84d99c69a4001389c3ea2cceb774c987e553fe41b3c95a047277ef5b1c61...
Getting image source signatures
Copying blob e627f7a5737e done
Copying blob bbace1e08f3e done
Copying blob e0d0fb6418b1 done
Copying config 7ed88ede9c done
Writing manifest to image destination
Storing signatures
7ed88ede9c7ed549c930b70bba242217385984ca369e0badbe4edffa5588ff2b
[root@osp16d ~]#

so this one works:
sat6.lasthome.solace.krynn/krynn_rhosp-osp16_containers-openstack-cron@sha256:c4ae84d99c69a4001389c3ea2cceb774c987e553fe41b3c95a047277ef5b1c61

but tripleo failed on:
sat6.lasthome.solace.krynn/v2/krynn_rhosp-osp16_containers-openstack-cron/blobs/sha256:c4ae84d99c69a4001389c3ea2cceb774c987e553fe41b3c95a047277ef5b1c61

Comment 18 Vincent S. Cojot 2024-11-05 10:13:46 UTC
https://paste.openstack.org/raw/bYjcoaNlNkNvhXusHP2j/

I can confirm that with the latest patch above the install completes without issues.

Install artifact is located at /home/stack/undercloud-install-20241105082119.tar.bzip2

########################################################

Deployment successful!

########################################################

]$ cat /home/stack/OSP/osp16.2/containers-prepare-parameter.yaml
parameter_defaults:
  ContainerImagePrepare:
  - tag_from_label: "{version}-{release}"
    push_destination: false
    set:
      ceph_alertmanager_image: krynn_rhosp-osp16_containers-ose-prometheus-alertmanager
      ceph_alertmanager_namespace: sat6.lasthome.solace.krynn
      ceph_alertmanager_tag: 4.1
      ceph_grafana_image: krynn_rhosp-osp16_containers-ose-grafana
      ceph_grafana_namespace: sat6.lasthome.solace.krynn
      ceph_grafana_tag: 4.1
      ceph_node_exporter_image: krynn_rhosp-osp16_containers-ose-prometheus-node-exporter
      ceph_node_exporter_namespace: sat6.lasthome.solace.krynn
      ceph_node_exporter_tag: v4.1
      ceph_prometheus_image: krynn_rhosp-osp16_containers-ose-prometheus
      ceph_prometheus_namespace: sat6.lasthome.solace.krynn
      ceph_prometheus_tag: 4.1
      ceph_image: krynn_rhosp-osp16_containers-rhceph-4-rhel8
      ceph_namespace: sat6.lasthome.solace.krynn
      ceph_tag: latest
      name_prefix: krynn_rhosp-osp16_containers-openstack-
      name_suffix: ''
      namespace: sat6.lasthome.solace.krynn
      neutron_driver: ovs
      rhel_containers: false
    excludes:
      - ose-prometheus
      - ose-prometheus-alertmanager
      - ose-prometheus-node-exporter
  ContainerImageRegistryCredentials:
    registry.redhat.io:
      6340056|vcojot-rhosp: eyJhbGciO............