Description of problem: python-paunch-2.5.3-9.el7ost.noarch (that includes https://review.opendev.org/#/c/753082/) breaks overcloud deployments because it tries to pull images tagged with the 'pcmklatest' tag and these do not exist on the registry but only on the controllers as they are tagged on the fly as needed: 2020-11-05 09:09:27Z [overcloud.AllNodesDeploySteps.ControllerOpenstackDeployment_Step2]: CREATE_FAILED Resource CREATE failed: Error: resources[1]: Deployment to server fa iled: deploy_status_code : Deployment exited with non-zero status code: 2 Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps.ControllerOpenstackDeployment_Step2.1: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 4044bde9-8089-4e4d-8c1d-c1aec2222fce status: CREATE_FAILED status_reason: | Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... "Error: image rh-osbs/rhosp13-openstack-redis:pcmklatest not found", "Error pulling 192.168.24.1:8787/rh-osbs/rhosp13-openstack-redis:pcmklatest. [1]", "stderr: Error: image rh-osbs/rhosp13-openstack-redis:pcmklatest not found" ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/315a708d-49d6-4091-ba44-f40e9f0e65e9_playbook.retry we probably need to filter our these images/tags if we want to stick to 'pull by default'.
The pcmk containers aren't managed by paunch so should not be in the paunch configuration. We did not see this upstream on the ovb jobs which utilize pacemaker. Please provide the system logs and paunch.log
Created attachment 1726898 [details] paunch.log controller-0
Created attachment 1726899 [details] journal
Investigating, it looks like paunch should be managing redis but the tag was changed like it was managed under pacemaker. It seems to be specific to redis at the moment. Can you also provide the templates used in the deployment?
(In reply to Alex Schultz from comment #4) > Investigating, it looks like paunch should be managing redis but the tag was > changed like it was managed under pacemaker. It seems to be specific to > redis at the moment. Can you also provide the templates used in the > deployment? thanks Alex for the fast triage, this is indeed something in THT (redis with tls-e) that messes up things for pauch. We're having a look.
[root@undercloud-0 database]# diff redis.yaml.old redis.yaml 271c271 < image: *redis_image_pcmklatest --- > image: *redis_config_image (using redis_config_image for the tls proxy seems to be enough).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 13.0 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5575