Description of problem: I deployed OSP 10 initially, with telemetry disabled, and I'm trying to execute fast forward upgrade. When i do it, the deployment steps fail, because they cannot find the images for ceilometer, panko, gnocchi: "2018-05-09 07:47:45,309 ERROR: 46773 -- ERROR configuring aodh", "2018-05-09 07:47:45,310 ERROR: 46773 -- ERROR configuring gnocchi", "2018-05-09 07:47:45,310 ERROR: 46773 -- ERROR configuring panko", "2018-05-09 07:47:45,310 ERROR: 46773 -- ERROR configuring ceilometer", Looking at the details for each one, i can see: "2018-05-09 07:47:44,556 WARNING: 46774 -- docker pull failed: error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: \"File not found.\\\"\"", "2018-05-09 07:47:44,557 WARNING: 46774 -- retrying pulling image: registry.access.redhat.com/rhosp13/openstack-panko-api:latest", "2018-05-09 07:47:44,557 ERROR: 46774 -- Failed to pull image: registry.access.redhat.com/rhosp13/openstack-panko-api:latest", "2018-05-09 07:47:44,557 DEBUG: 46774 -- Trying to pull repository registry.access.redhat.com/rhosp13/openstack-panko-api ... ", "2018-05-09 07:47:44,557 DEBUG: 46774 -- error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: \"File not found.\\\"\"", "2018-05-09 07:47:44,561 DEBUG: 46774 -- NET_HOST enabled", "2018-05-09 07:47:44,561 DEBUG: 46774 -- Running docker command: /usr/bin/docker run --user root --name docker-puppet-panko --env PUPPET_TAGS=file,file_line,concat,augeas,cron,panko_api_paste_ini,panko_config --env NAME=panko --en v HOSTNAME=overcloud-controller-0 --env NO_ARCHIVE= --env STEP=6 --volume /etc/localtime:/etc/localtime:ro --volume /tmp/tmp_lnTRQ:/etc/config.pp:ro,z --volume /etc/puppet/:/tmp/puppet-etc/:ro,z --volume /usr/share/openstack-puppet/module s/:/usr/share/openstack-puppet/modules/:ro,z --volume /var/lib/config-data:/var/lib/config-data/:z --volume tripleo_logs:/var/log/tripleo/ --volume /dev/log:/dev/log --volume /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --vo lume /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume /var/lib/ docker-puppet/docker-puppet.sh:/var/lib/docker-puppet/docker-puppet.sh:z --entrypoint /var/lib/docker-puppet/docker-puppet.sh --net host --volume /etc/hosts:/etc/hosts:ro registry.access.redhat.com/rhosp13/openstack-panko-api:latest", "2018-05-09 07:47:45,308 ERROR: 46774 -- Failed running docker-puppet.py for panko", "2018-05-09 07:47:45,308 ERROR: 46774 -- Unable to find image 'registry.access.redhat.com/rhosp13/openstack-panko-api:latest' locally", "Trying to pull repository registry.access.redhat.com/rhosp13/openstack-panko-api ... ", "2018-05-09 07:47:45,308 INFO: 46774 -- Finished processing puppet configs for panko", It tries to pull the image from registry.access.redhat.com/rhosp13, that actually does not exist. And it should be trying to pull them locally. But the issue, is that, as i deploy without telemetry, the initial docker-images.yaml i create, don't contain any entries for panko, ceilometer, etc... When looking at docker-puppet.json in the controller, for example i can see: [{"puppet_tags": "aodh_api_paste_ini,aodh_config", "config_volume": "aodh", "step_config": "include tripleo::profile::base::aodh::api\n\ninclude ::tripleo::profile::base::database::mysql::client", "config_image": "registry.access.redhat.com/rhosp13/openstack-aodh-api:latest"}, But for other entries such as nova i see: {"puppet_tags": "swift_config,swift_container_config,swift_container_sync_realms_config,swift_account_config,swift_object_config,swift_object_expirer_config,rsync::server", "config_volume": "swift", "step_config": "include ::tripleo::profile::base::swift::storage\n\nclass xinetd() {}", "config_image": "192.0.2.1:8787/rhosp13/openstack-swift-proxy-server:2018-05-07.2"}] Pointing to internal registry, that should be the right thing. So either all the components are included on the docker-images.yaml file properly, or entries for docker-puppet.json, for optional components such as telemetry, don't need to be added there.
Created attachment 1433610 [details] docker-puppet.json
Created attachment 1433612 [details] docker-images.yaml
My env file contains those settings, that disables the exact components that are failing when getting docker image: resource_registry: OS::TripleO::Controller::Net::SoftwareConfig: nic-configs/controller.yaml OS::TripleO::Compute::Net::SoftwareConfig: nic-configs/compute.yaml OS::TripleO::CephStorage::Net::SoftwareConfig: nic-configs/ceph-storage.yaml OS::TripleO::NodeUserData: first-boot.yaml OS::TripleO::NodeExtraConfigPost: post-install.yaml OS::TripleO::Services::CeilometerApi: OS::Heat::None OS::TripleO::Services::CeilometerCollector: OS::Heat::None OS::TripleO::Services::CeilometerExpirer: OS::Heat::None OS::TripleO::Services::CeilometerAgentCentral: OS::Heat::None OS::TripleO::Services::CeilometerAgentNotification: OS::Heat::None OS::TripleO::Services::CeilometerAgentIpmi: OS::Heat::None OS::TripleO::Services::ComputeCeilometerAgent: OS::Heat::None OS::TripleO::Services::GnocchiApi: OS::Heat::None OS::TripleO::Services::GnocchiMetricd: OS::Heat::None OS::TripleO::Services::GnocchiStatsd: OS::Heat::None OS::TripleO::Services::AodhApi: OS::Heat::None OS::TripleO::Services::AodhEvaluator: OS::Heat::None OS::TripleO::Services::AodhNotifier: OS::Heat::None OS::TripleO::Services::AodhListener: OS::Heat::None OS::TripleO::Services::PankoApi: OS::Heat::None
Hi Yolanda, taking this to try and triage. We may need to reach out to other folks (telemetry and/or deployment/containers possibly not sure yet) so can you please clarify when this is happening. From comment #0 it sounds like it happend during the "openstack overcloud upgrade run --nodes Controller"... i.e. after you run the ffwd-upgrade prepare, then the ffwd-upgrade run, and then the upgrade step which fails. Can you also please include the full command, especially all the -e env files you used on the ffwd-upgrade prepare (for example to check the ordering of the env files... where is the contents of comment #3 included). The OSP10 deployment... were you also setting OS::Heat::None for ceilo there? Or were you using something like https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/disabled/ceilometer-api-disabled.yaml Is Ceilometer in your roles_data.yaml (wondering if we remove it from there whether ths issue goes away). I can't see where the image would be 'defaulted' as it is in your env... and whatsmore why/how it defaults to registry.access.redhat.com maybe containers dfg might know more. I'll bring this to scrum and can reach out to them later .
The problem seemed to be when you add docker.yaml env file after my own environment files. Seems that then, the resources that i disabled, are enabled again. I am not hitting that if i use the cli commands, but was hitting it with older versions of tripleo-upgrade,that were setting that docker and docker-ha yaml files.
OK then wdyt should we close this one?
It was caused by some problem with tripleo-upgrade, this has been fixed now