Steps to Reproduce: Deploy a cluster with ComputeSriov role, without enabling OS::TripleO::Services::NeutronMlnxAgent TripleO service (environments/services/neutron-mlnx-agent.yaml). Actual results: Container image prepare tries to pull image eventhoug TripleO service is disabled 2019-10-06 08:54:26,630.630 764843 ERROR root [ ] Image prepare failed: 401 Client Error: Unauthorized for url: https://registry-1.docker.io/v2/tripleomaster/centos-binary-neutron-mlnx-agent/manifests/current-tripleo-rdo Expected results: Container image prepare should ignore the images whose services are disabled by default. And deployment should be successful.
working with mellanox team - https://review.opendev.org/#/c/687484/
Container image prepare has been integrated as a TripleO service with external_deploy_tasks. And the actual preparation is handled by the role tripleo-container-image-prepare in tripleo-ansible module. Because of this change, the list of environment files cannot be provided as like the earlier commend "openstack overcloud container image prepare". Because of which image prepartion happens to all the services part of the role, irrespective of whether it is enabled or not. Looking at ways to provide enabled services on a TripleO role to image prepration, it is not possible get this information in the "OS::TripleO::Services::ContainerImagePrepare" as the Service's ResourceChain is not yet avaialble. Another way is to use the ansible groupvar "enabled_serviecs" which will be set on "overcloud" group, which will have the list of enabled services. But the problem is the "enabled_services" contains the service_name defined in the TripleO service template. And there is no available mapping between TripleO service template type (OS::TripleO::Services::ContainerImagePrepare) and service_name (container_image_prepare). It is possible to use "enabled_services" by modifying the image-to-service-to-parameter mapping in file "tripleo-common/container-images/overcloud_containers.yaml.j2", to update the service_name along with TripleO service types. I will do a quick PoC to see if this can be implemented.
One option based on the way it currently works is to copy and modify roles/ComputeSriov.yaml to exclude OS::TripleO::Services::NeutronMlnxAgent. Another option might be to add the following to the ContainerImagePrepare entries to ensure that image is ignored: excludes: [neutron-mlnx-agent] If the full ContainerImagePrepare parameter can be supplied then I can point out where to add the excludes. See the upstream docs[1] for details. This bug demonstrates a drawback in the current prepare process, when a service type exists in the roles data, but is mapped to OS::Heat::None in the environment, the prepare process still considers the service deployed because it doesn't have access to the service type mappings (usually this just means unneeded images are transferred, but in this case the image doesn't exist, which is a scenario I didn't consider ;) I'm hoping one of the above two workarounds will be enough for Mellanox for now. The enabled_services approach looks like it might be better than the current service types approach, but it might take some time and effort to switch over. [1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html#layering-image-preparation-entries
Thanks Steve. Till the kolla patch [1] is merged, the issue will be fixed (it should have be added dependecy for THT patch, but got missed). Though, it will prepare the images which are not necessary for a deployment. I will continue to work on enabled_services approach. [1] https://review.opendev.org/#/c/672023/