Bug 1760403 - Container image pull fails for neutron-mlnx-agent even though the service is disabled
Summary: Container image pull fails for neutron-mlnx-agent even though the service is ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Saravanan KR
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-10 13:17 UTC by Saravanan KR
Modified: 2022-02-09 16:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-09 16:05:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 688100 0 None ABANDONED Prepare images using the enabled service_names 2020-10-06 22:16:39 UTC
Red Hat Issue Tracker NFV-1122 0 None None None 2022-02-09 16:07:32 UTC
Red Hat Issue Tracker OSP-12488 0 None None None 2022-02-05 09:17:56 UTC

Description Saravanan KR 2019-10-10 13:17:01 UTC
Steps to Reproduce:
Deploy a cluster with ComputeSriov role, without enabling OS::TripleO::Services::NeutronMlnxAgent TripleO service (environments/services/neutron-mlnx-agent.yaml).


Actual results:
Container image prepare tries to pull image eventhoug TripleO service is disabled
2019-10-06 08:54:26,630.630 764843 ERROR root [  ] Image prepare failed: 401 Client Error: Unauthorized for url: https://registry-1.docker.io/v2/tripleomaster/centos-binary-neutron-mlnx-agent/manifests/current-tripleo-rdo


Expected results:
Container image prepare should ignore the images whose services are disabled by default. And deployment should be successful.

Comment 1 Saravanan KR 2019-10-10 13:23:26 UTC
working with mellanox team - https://review.opendev.org/#/c/687484/

Comment 2 Saravanan KR 2019-10-11 03:32:26 UTC
Container image prepare has been integrated as a TripleO service with external_deploy_tasks. And the actual preparation is handled by the role tripleo-container-image-prepare in tripleo-ansible module. Because of this change, the list of environment files cannot be provided as like the earlier commend "openstack overcloud container image prepare". Because of which image prepartion happens to all the services part of the role, irrespective of whether it is enabled or not. 

Looking at ways to provide enabled services on a TripleO role to image prepration, it is not possible get this information in the "OS::TripleO::Services::ContainerImagePrepare" as the Service's ResourceChain is not yet avaialble. 

Another way is to use the ansible groupvar "enabled_serviecs" which will be set on "overcloud" group, which will have the list of enabled services. But the problem is the "enabled_services" contains the service_name defined in the TripleO service template. And there is no available mapping between TripleO service template type (OS::TripleO::Services::ContainerImagePrepare) and service_name (container_image_prepare). It is possible to use "enabled_services" by modifying the image-to-service-to-parameter mapping in file "tripleo-common/container-images/overcloud_containers.yaml.j2", to update the service_name along with TripleO service types. I will do a quick PoC to see if this can be implemented.

Comment 3 Steve Baker 2019-10-14 03:23:57 UTC
One option based on the way it currently works is to copy and modify roles/ComputeSriov.yaml to exclude OS::TripleO::Services::NeutronMlnxAgent.

Another option might be to add the following to the ContainerImagePrepare entries to ensure that image is ignored:
  excludes: [neutron-mlnx-agent]
If the full ContainerImagePrepare parameter can be supplied then I can point out where to add the excludes. See the upstream docs[1] for details.

This bug demonstrates a drawback in the current prepare process, when a service type exists in the roles data, but is mapped to OS::Heat::None in the environment, the prepare process still considers the service deployed because it doesn't have access to the service type mappings (usually this just means unneeded images are transferred, but in this case the image doesn't exist, which is a scenario I didn't consider ;)

I'm hoping one of the above two workarounds will be enough for Mellanox for now. The enabled_services approach looks like it might be better than the current service types approach, but it might take some time and effort to switch over.

[1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html#layering-image-preparation-entries

Comment 4 Saravanan KR 2019-10-14 08:39:21 UTC
Thanks Steve. Till the kolla patch [1] is merged, the issue will be fixed (it should have be added dependecy for THT patch, but got missed). 

Though, it will prepare the images which are not necessary for a deployment. I will continue to work on enabled_services approach. 

[1] https://review.opendev.org/#/c/672023/


Note You need to log in before you can comment on or make changes to this bug.