Description of problem: The OSP FFU upgrade (16.2->) of a DCN stack (multistack, central site with controllers node upgraded successfully) with etcd deployed (to manage Cinder A/A service on the DCN site) fails on following error: FATAL | Pre-fetch all the containers | dcn1-computehci1-1 | item=site-undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-etcd:17.1_20231116.2 | error={"ansible_loop_var": "prefetch_image", "attempts": 5, "changed": false, "msg": "Failed to pull image site-undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-etcd:17.1_20231116.2", "prefetch_image": "site-undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-etcd:17.1_20231116.2" The deployment uses the registry on undercloud and the proper image (for the 17.1) of etcd is not available on there (There is only the 16.2 version of etcd image which was used for initial 16. deployment). Note: Central site does not require etcd and is upgraded successfully). The undercloud config has following parameters set: container_images_file = /home/stack/containers-prepare-parameter.yaml container_insecure_registries= brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888,rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002,registry-proxy.engineering.redhat.com The the containers-prepare-parameter.yaml looks like: parameter_defaults: ContainerImagePrepare: - tag_from_label: '{version}-{release}' set: namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs name_prefix: rhosp17-openstack- name_suffix: '' tag: 17.1_20231116.2 rhel_containers: false neutron_driver: ovn ceph_namespace: registry-proxy.engineering.redhat.com/rh-osbs ceph_image: rhceph ceph_tag: '5' ceph_prometheus_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_prometheus_image: openshift-ose-prometheus ceph_prometheus_tag: v4.10 ceph_alertmanager_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_alertmanager_image: openshift-ose-prometheus-alertmanager ceph_alertmanager_tag: v4.10 ceph_node_exporter_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_node_exporter_image: openshift-ose-prometheus-node-exporter ceph_node_exporter_tag: v4.10 ceph_grafana_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_grafana_image: grafana ceph_grafana_tag: '5' push_destination: true MultiRhelRoleContainerImagePrepare: &id001 - tag_from_label: '{version}-{release}' set: namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs name_prefix: rhosp17-openstack- name_suffix: '' tag: 17.1_20231116.2 rhel_containers: false neutron_driver: ovn ceph_namespace: registry-proxy.engineering.redhat.com/rh-osbs ceph_image: rhceph ceph_tag: '5' ceph_prometheus_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_prometheus_image: openshift-ose-prometheus ceph_prometheus_tag: v4.10 ceph_alertmanager_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_alertmanager_image: openshift-ose-prometheus-alertmanager ceph_alertmanager_tag: v4.10 ceph_node_exporter_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_node_exporter_image: openshift-ose-prometheus-node-exporter ceph_node_exporter_tag: v4.10 ceph_grafana_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_grafana_image: grafana ceph_grafana_tag: '5' push_destination: true excludes: - collectd - nova-libvirt - tag_from_label: '{version}-{release}' set: namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs name_prefix: rhosp17-openstack- name_suffix: '' tag: 17.1_20231116.1 rhel_containers: false neutron_driver: ovn ceph_namespace: registry-proxy.engineering.redhat.com/rh-osbs ceph_image: rhceph ceph_tag: '5' ceph_prometheus_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_prometheus_image: openshift-ose-prometheus ceph_prometheus_tag: v4.10 ceph_alertmanager_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_alertmanager_image: openshift-ose-prometheus-alertmanager ceph_alertmanager_tag: v4.10 ceph_node_exporter_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_node_exporter_image: openshift-ose-prometheus-node-exporter ceph_node_exporter_tag: v4.10 ceph_grafana_namespace: rhos-qe-mirror.lab.eng.tlv2.redhat.com:5002/rh-osbs ceph_grafana_image: grafana ceph_grafana_tag: '5' push_destination: true includes: - collectd - nova-libvirt ComputeHCI1ContainerImagePrepare: *id001 ComputeHCIScaleOut1ContainerImagePrepare: *id001 The procedure which is executed includes running "openstack overcloud upgrade prepare" command for each stack followed by "openstack overcloud external-upgrade run ${EXTERNAL_ANSWER} --stack ${STACK} --tags container_image_prepare" But It seems like It does not do anything in the case of DCN stack, It does prepare and upload images in the case of central stack. The service which is reponsible for the etcd is: OS::TripleO::Services::Etcd and is included in the roles file which is attached to do uprgade prepare command line. It seems like the generated playbook external_deploy_steps_tasks_step1.yaml (in overcloud-deploy/dcn1/config-download/dcn1/ does not have the task "Run tripleo-container-image-prepare role" while the same playbook on central stack has it, which means that no container image prepare is executed for dcn1 when overcloud prepare is executed while It happens for central stack. (undercloud) [stack@site-undercloud-0 overcloud-deploy]$ fgrep "Run tripleo-container-image-prepare" central/config-download/central/external_deploy_steps_tasks_step1.yaml name: Run tripleo-container-image-prepare role (undercloud) [stack@site-undercloud-0 overcloud-deploy]$ fgrep "Run tripleo-container-image-prepare" dcn1/config-download/dcn1/external_deploy_steps_tasks_step1.yaml (undercloud) [stack@site-undercloud-0 overcloud-deploy]$ Once I upload the image into registry manually the ffu works as expected, It just imo should be uploaded into undercloud registry during the "openstack overcloud external-upgrade run ${EXTERNAL_ANSWER} --stack ${STACK} --tags container_image_prepare" step. Version-Release number of selected component (if applicable): rpm -qa| grep tripleo ansible-role-tripleo-modify-image-1.5.1-17.1.20230622042720.b6eedb6.el8ost.noarch openstack-tripleo-common-containers-15.4.1-17.1.20230927003755.el8ost.noarch openstack-tripleo-heat-templates-14.3.1-17.1.20231103003744.el8ost.noarch openstack-tripleo-validations-14.3.2-17.1.20231026023743.2b526f8.el8ost.noarch openstack-tripleo-common-15.4.1-17.1.20230927003755.el8ost.noarch ansible-tripleo-ipsec-11.0.1-17.1.20230621182214.b5559c8.el8ost.noarch python3-tripleo-common-15.4.1-17.1.20230927003755.el8ost.noarch python3-tripleoclient-16.5.1-17.1.20230927003754.f3599d0.el8ost.noarch openstack-tripleo-puppet-elements-14.1.3-17.1.20230811123850.b4e0cbd.el8ost.noarch python3-tripleoclient-heat-installer-12.6.1-2.20220725105244.8cc1d6d.el8ost.noarch openstack-tripleo-image-elements-13.1.3-17.1.20230622064519.a641940.el8ost.noarch ansible-tripleo-ipa-0.3.1-17.1.20230627183823.8d29d9e.el8ost.noarch puppet-tripleo-14.2.3-17.1.20231102193745.40278e1.el8ost.noarch tripleo-ansible-3.3.1-17.1.20231101233745.4d015bf.el8ost.noarch The overcloud upgrade prepare cmdline looks like: openstack overcloud upgrade prepare ${PREPARE_ANSWER} \ --stack dcn1 \ --templates /usr/share/openstack-tripleo-heat-templates \ -n /home/stack/dcn1/network/network_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovs.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/barbican-backend-simple-crypto.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/barbican-edge.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/dcn-hci.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/net-multiple-nics.yaml \ -e /home/stack/dcn1/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovs.yaml \ -e /home/stack/dcn1/network/network-environment.yaml \ -e /home/stack/dcn1/inject-trust-anchor.yaml \ -e /home/stack/dcn1/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/dcn1/glance.yaml \ -e /home/stack/dcn1/nodes_data.yaml \ -e /home/stack/dcn1/debug.yaml \ -e /home/stack/dcn1/use-dns-for-vips.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-dashboard.yaml \ -e /home/stack/central_ceph_external.yaml \ -e /home/stack/central-export.yaml \ -e /home/stack/dcn1/config_heat.yaml \ -e /home/stack/dcn1/firstboot.yaml \ -e ~/containers-prepare-parameter.yaml \ -e /home/stack/dcn1/barbican.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-everywhere-endpoints-dns.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-internal-tls.yaml \ -e /home/stack/dcn1/cloud-names.yaml \ -e /home/stack/dcn1/ipaservices-baremetal-ansible.yaml \ -e /home/stack/cli_opts_params.yaml \ -e /home/stack/overcloud-params.yaml -e /home/stack/overcloud-deploy/dcn1/dcn1-network-environment.yaml -e /home/stack/tmp/dcn1-baremetal_deployment.yaml -e /home/stack/tmp/central-generated-networks-deployed.yaml -e /home/stack/tmp/central-generated-vip-deployed.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/cephadm/cephadm-rbd-only.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/cephadm/ceph-dashboard.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/nova-hw-machine-type-upgrade.yaml \ -e /home/stack/containers-prepare-parameter.yaml \ --roles-file /home/stack/dcn1/roles/roles_data.yaml 2>&1 How reproducible: Always
I'd need to see the upgrade prepare and upgrade run (which you've now provided), and all the custom templates and files passed to those commands.
I'm thinking the issue is due to the OS::TripleO::Services::ContainerImagePrepare service being only on the Controller role, so when the commands are run for the dcn stacks, it does not re-run ContainerImagePrepare. We may need to add that service to dcn roles, or document which container image prepare command to run manually.
(In reply to James Slagle from comment #4) > I'm thinking the issue is due to the > OS::TripleO::Services::ContainerImagePrepare service being only on the > Controller role, so when the commands are run for the dcn stacks, it does > not re-run ContainerImagePrepare. We may need to add that service to dcn I can test that - manually adding the service to my dcn roles and rerun the job If That's helpful? > roles, or document which container image prepare command to run manually.
(In reply to Marian Krcmarik from comment #5) > (In reply to James Slagle from comment #4) > > I'm thinking the issue is due to the > > OS::TripleO::Services::ContainerImagePrepare service being only on the > > Controller role, so when the commands are run for the dcn stacks, it does > > not re-run ContainerImagePrepare. We may need to add that service to dcn > I can test that - manually adding the service to my dcn roles and rerun the > job If That's helpful? > > roles, or document which container image prepare command to run manually. Yes, that's worth trying. I'm not sure if it will remove unmanaged images though. Can you check if the other service images (such as nova-api) are still in the undercloud image-serve, that would be helpful.
(In reply to James Slagle from comment #6) > Yes, that's worth trying. I'm not sure if it will remove unmanaged images > though. Can you check if the other service images (such as nova-api) are > still in the undercloud image-serve, that would be helpful. Adding the OS::TripleO::Services::ContainerImagePrepare service manually into roles did solve the problem and the etcd image got successfully fetched during the upgrade and It seems other images which are not needed for any service on the DCN site are still present in the undercloud image server, I can see i.e. openstack-nova-api image there and I can pull it. So I let you decide which way we want to fix this.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: openstack-tripleo-heat-templates and tripleo-ansible update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:2736