Description of problem: During FFU of Compute DVR usecase with OVS, the upgrade failed with as ansible was pulling wrong openstack-neutron-l3-agent container image ~~~ TASK [Pre-fetch all the containers] ******************************************** Saturday 18 July 2020 13:09:43 -0400 (0:00:00.441) 0:13:08.800 ********* failed: [overcloud-novacompute-dvr-0] (item=registry.redhat.io/rhosp-rhel8/openstack-neutron-l3-agent:16.1) => {"ansible_loop_var": "prefetch_image", "changed": false, "msg": "Failed to pull image registry.redhat.io/rhosp-rhel8/openstack-neutron-l3-agent:16.1", "prefetch_image": "registry.redhat.io/rhosp-rhel8/openstack-neutron-l3-agent:16.1"} ~~~ Actual container is ~~~ $ openstack tripleo container image list | grep l3 | docker://undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-neutron-l3-agent:16.1-39 | ~~~ Below is the cmd use to run upgrade ~~~ openstack overcloud upgrade run --stack overcloud --limit overcloud-novacompute-dvr-0 ~~~ Below is the prepare cmd we used ~~~ $ cat osp16/00_overcloud-prepare.sh #!bin/bash mv nohup.out nohup-data/nohup.out-$(date +"%m-%d-%y"_"%T") nohup openstack overcloud upgrade prepare --templates \ -r /home/stack/templates/roles_data.yaml \ -e /home/stack/templates/node-info.yaml \ -e /home/stack/templates/rhsm.yaml \ -e /home/stack/templates/upgrades-environment.yaml \ -e /home/stack/containers-prepare-parameter.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-ovs-dvr.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/templates/neutrondvr.yaml \ -e /home/stack/templates/network-environment.yaml \ --ntp-server 192.168.24.1 --log-file overcloud_upgrade_prepare_1.log & ~~~ Version-Release number of selected component (if applicable): Red Hat Openstack Platform 16.1-Beta How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Ansible is pulling container image which dosent exist Expected results: It should use the correct and latest container image available Additional info:
Please add the content of the yaml files and please and how are you creating the scripts. The container images seem to be ok.
Hello, Could you please confirm that you had the OS::TripleO::Services::NeutronL3Agent service enabled in your roles_data? In theory, the workaround you added should be automatically included in the containers-prepare-parameter step if you have the right service enabled: https://github.com/openstack/tripleo-common/blob/master/container-images/overcloud_containers.yaml.j2#L462-L467 Also, do you have a CI job I could have a look at? Or did you run the procedure manually? the logs look confusing to me.
I missed the need-info from the comment 3 above
@jose, This is computeDVR node so l3 service role is available under runs on compute node. ~~~ openstack overcloud roles generate -o /home/stack/templates/roles_data.yaml Controller ComputeDVR ~~~ >> /home/stack/templates/roles_data.yaml ~~~ ############################################################################### # Role: ComputeDVR # ############################################################################### - name: ComputeDVR description: | DVR enabled Compute Node role CountDefault: 1 tags: - external_bridge networks: InternalApi: subnet: internal_api_subnet Tenant: subnet: tenant_subnet Storage: subnet: storage_subnet HostnameFormatDefault: '%stackname%-novacompute-dvr-%index%' RoleParametersDefault: TunedProfileName: "virtual-host" update_serial: 25 ServicesDefault: - OS::TripleO::Services::Aide - OS::TripleO::Services::AuditD - OS::TripleO::Services::BootParams - OS::TripleO::Services::CACerts - OS::TripleO::Services::CephClient - OS::TripleO::Services::CephExternal - OS::TripleO::Services::CertmongerUser - OS::TripleO::Services::Collectd - OS::TripleO::Services::ComputeCeilometerAgent - OS::TripleO::Services::ComputeNeutronCorePlugin - OS::TripleO::Services::ComputeNeutronL3Agent << l3 agent service ~~~ ~~~ (overcloud) [stack@undercloud ~]$ cat ~/containers-prepare-parameter.yaml # Generated with the following on 2020-07-09T11:46:41.783514 # # openstack tripleo container image prepare default --local-push-destination --output-env-file containers-prepare-parameter.yaml # parameter_defaults: DockerInsecureRegistryAddress: - 192.168.24.1:8787 - undercloud.ctlplane.localdomain:8787 ContainerImagePrepare: - push_destination: true set: ceph_alertmanager_image: ose-prometheus-alertmanager ceph_alertmanager_namespace: registry.redhat.io/openshift4 ceph_alertmanager_tag: 4.1 ceph_grafana_image: rhceph-4-dashboard-rhel8 ceph_grafana_namespace: registry.redhat.io/rhceph ceph_grafana_tag: 4 ceph_image: rhceph-4-rhel8 ceph_namespace: registry.redhat.io/rhceph ceph_node_exporter_image: ose-prometheus-node-exporter ceph_node_exporter_namespace: registry.redhat.io/openshift4 ceph_node_exporter_tag: v4.1 ceph_prometheus_image: ose-prometheus ceph_prometheus_namespace: registry.redhat.io/openshift4 ceph_prometheus_tag: 4.1 ceph_tag: latest name_prefix: openstack- name_suffix: '' namespace: registry.redhat.io/rhosp-beta neutron_driver: openvswitch rhel_containers: false tag: '16.1' name_prefix_stein: openstack- name_suffix_stein: '' namespace_stein: registry.redhat.io/rhosp15-rhel8 tag_stein: 15.0 tag_from_label: '{version}-{release}' ContainerImageRegistryCredentials: registry.redhat.io: user:pass ~~~
So, I did have a look at the stack's environment parameters and it seems like that's a common pattern in other images: ContainerManilaShareImage: registry.redhat.io/rhosp-rhel8/openstack-manila-share:16.1 ContainerMemcachedConfigImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-memcached:16.1-44 ContainerMemcachedImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-memcached:16.1-44 ContainerMetricsQdrConfigImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-qdrouterd:16.1-43 ContainerMetricsQdrImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-qdrouterd:16.1-43 ContainerMistralApiImage: registry.redhat.io/rhosp-rhel8/openstack-mistral-api:16.1 ContainerMistralApiImageStein: '' ContainerMistralConfigImage: registry.redhat.io/rhosp-rhel8/openstack-mistral-api:16.1 ContainerMistralEngineImage: registry.redhat.io/rhosp-rhel8/openstack-mistral-engine:16.1 ContainerMistralEventEngineImage: registry.redhat.io/rhosp-rhel8/openstack-mistral-event-engine:16.1 ContainerMistralExecutorImage: registry.redhat.io/rhosp-rhel8/openstack-mistral-executor:16.1 ContainerMultipathdConfigImage: registry.redhat.io/rhosp-rhel8/openstack-multipathd:16.1 ContainerMultipathdImage: registry.redhat.io/rhosp-rhel8/openstack-multipathd:16.1 ContainerMysqlClientConfigImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-mariadb:16.1-43 ContainerMysqlConfigImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-mariadb:16.1-43 ContainerMysqlImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-mariadb:16.1-43 ContainerNeutronApiImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-neutron-server:16.1-40 ContainerNeutronApiImageStein: '' ContainerNeutronConfigImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-neutron-server:16.1-40 ContainerNeutronDHCPImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-neutron-dhcp-agent:16.1-39 ContainerNeutronL3AgentImage: registry.redhat.io/rhosp-rhel8/openstack-neutron-l3-agent:16.1 ContainerNeutronMetadataImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-neutron-metadata-agent:16.1-43 ContainerNeutronMlnxImage: registry.redhat.io/rhosp-rhel8/openstack-neutron-mlnx-agent:16.1 ContainerNeutronSriovImage: registry.redhat.io/rhosp-rhel8/openstack-neutron-sriov-agent:16.1 ContainerNovaApiImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-api:16.1-41 ContainerNovaApiImageStein: '' ContainerNovaComputeImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-compute:16.1-37 ContainerNovaComputeIronicImage: registry.redhat.io/rhosp-rhel8/openstack-nova-compute-ironic:16.1 ContainerNovaConductorImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-conductor:16.1-39 ContainerNovaConductorImageStein: '' ContainerNovaConfigImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-api:16.1-41 ContainerNovaLibvirtConfigImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-compute:16.1-37 ContainerNovaLibvirtImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-libvirt:16.1-40 ContainerNovaMetadataConfigImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-api:16.1-41 ContainerNovaMetadataImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-api:16.1-41 ContainerNovaSchedulerImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-scheduler:16.1-38 ContainerNovaSerialproxyConfigImage: registry.redhat.io/rhosp-rhel8/openstack-nova-serialproxy:16.1 ContainerNovaSerialproxyImage: registry.redhat.io/rhosp-rhel8/openstack-nova-serialproxy:16.1 ContainerNovaVncProxyImage: undercloud.ctlplane.localdomain:8787/rhosp-beta/openstack-nova-novncproxy:16.1-40 ContainerNovajoinConfigImage: registry.redhat.io/rhosp-rhel8/openstack-novajoin-server:16.1 ContainerNovajoinNotifierImage: registry.redhat.io/rhosp-rhel8/openstack-novajoin-notifier:16.1 ContainerNovajoinServerImage: registry.redhat.io/rhosp-rhel8/openstack-novajoin-server:16.1 The ContainerNeutronL3AgentImage image isn't the only one which points at 16.1, you can see others, like ContainerMistralConfigImage or ContainerMistralExecutorImage. Having a deeper look, all the container images using the 16.1 tag are, for some unknown reason, using the rhosp-rhel8 namespace which isn't right. As it doesn't have 16.1 tag registered yet: https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-neutron-l3-agent/5de6baf6d70cc51644a57066 It is the rhosp-beta namespace the one containing the right tag: https://catalog.redhat.com/software/containers/rhosp-beta/openstack-neutron-l3-agent/5cdc8278d70cc57c44b28750 I believe, the reason is the tag_from_label parameter, which was included in the containers-prepare-parameter.yaml. I've removed it from the file, relaunched the upgrade prepare step and I will check again the content of the images. Probably, this time all will use the rhosp-beta namespace.
Created attachment 1702031 [details] Roles data
So, I was looking at the wrong place. The problem is that, as ComputeNeutronL3Agent service is enabled, but no NeutronL3Agent, then we don't set the image Heat parameter. This can be seen here: - imagename: "{{namespace}}/{{name_prefix}}neutron-l3-agent{{name_suffix}}:{{tag}}" image_source: kolla params: - ContainerNeutronL3AgentImage services: - OS::TripleO::Services::NeutronL3Agent https://github.com/openstack/tripleo-common/blob/stable/train/container-images/overcloud_containers.yaml.j2#L482-L487 The only service which triggers the image name creation is NeutronL3Agent, there is no presence of the ComputeNeutronL3Agent service in the file. The solution would be to do something similar as it is being done with the ComputeNeutronOvsAgent and NeutronOvsAgent (https://github.com/openstack/tripleo-common/blob/stable/train/container-images/overcloud_containers.yaml.j2#L496-L502) : - imagename: "{{namespace}}/{{name_prefix}}neutron-l3-agent{{name_suffix}}:{{tag}}" image_source: kolla params: - ContainerNeutronL3AgentImage services: - OS::TripleO::Services::NeutronL3Agent - OS::TripleO::Services::ComputeNeutronL3Agent This would ensure we get the rigth container image for openstack-neutron-l3-agent container. Moving the BZ to DFG:Networking, as they are more familiar with this type of set up. However, the configuration seems to be fine as the environment deployed properly with DVR in OSP13. I attached the full roles_data.yaml from the environment in previous comment.
Analysis from comment 9 looks good, we should have the ComputeNeutron* variants listed in overcloud_containers mapping. We can run most of them directly on compute nodes, for example ComputeDVR role: https://github.com/openstack/tripleo-heat-templates/blob/master/roles/ComputeDVR.yaml DCN deployments can also have dhcp/metadata on compute nodes
It has been observed during the FFU 13 to 16.1 lab testing in PSI that the very same issue occurs with openstack-neutron-metadata-agent and ComputeNeutronMetadataAgent. If this service is enabled in some of the roles the image won't be downloaded into the Undercloud's registry as it isn't present in the container-images/tripleo-containers.yaml.j2 https://github.com/openstack/tripleo-common/blob/master/container-images/tripleo_containers.yaml.j2#L470 and the upgrade fails when trying to upgrade the compute node including such a service. @Brent, if it's ok by you I will add such a patch based on https://review.opendev.org/#/c/761418/. Also, it would be great to update the Knowledge Base entry to cover the ComputeNeutronMetadataAgent service too until these patches merge.
For the knowledge base entry, this would solve it: sudo hiera container_image_prepare_node_names ["undercloud.ctlplane.prod.ipa.lab"] parameter_defaults: ContainerNeutronL3AgentImage: undercloud.ctlplane.prod.ipa.lab:8787/rhosp-rhel8/openstack-neutron-l3-agent:16.1-51 ContainerNeutronMetadataImage: undercloud.ctlplane.prod.ipa.lab:8787/rhosp-rhel8/openstack-neutron-metadata-agent:16.1
*** Bug 1893638 has been marked as a duplicate of this bug. ***