Bug 1576308 - docker-puppet.json incorrectly generated for fast forward upgrades
Summary: docker-puppet.json incorrectly generated for fast forward upgrades
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Marios Andreou
QA Contact: Arik Chernetsky
Depends On:
TreeView+ depends on / blocked
Reported: 2018-05-09 08:15 UTC by Yolanda Robla
Modified: 2018-05-11 12:52 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2018-05-11 12:52:25 UTC
Target Upstream Version:

Attachments (Terms of Use)
docker-puppet.json (11.22 KB, text/plain)
2018-05-09 08:18 UTC, Yolanda Robla
no flags Details
docker-images.yaml (5.42 KB, text/plain)
2018-05-09 08:19 UTC, Yolanda Robla
no flags Details

Description Yolanda Robla 2018-05-09 08:15:36 UTC
Description of problem:

I deployed OSP 10 initially, with telemetry disabled, and I'm trying to execute fast forward upgrade. When i do it, the deployment steps fail, because they cannot find the images for ceilometer, panko, gnocchi:

        "2018-05-09 07:47:45,309 ERROR: 46773 -- ERROR configuring aodh", 
        "2018-05-09 07:47:45,310 ERROR: 46773 -- ERROR configuring gnocchi", 
        "2018-05-09 07:47:45,310 ERROR: 46773 -- ERROR configuring panko", 
        "2018-05-09 07:47:45,310 ERROR: 46773 -- ERROR configuring ceilometer",

Looking at the details for each one, i can see:

        "2018-05-09 07:47:44,556 WARNING: 46774 -- docker pull failed: error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: \"File not found.\\\"\"", 
        "2018-05-09 07:47:44,557 WARNING: 46774 -- retrying pulling image: registry.access.redhat.com/rhosp13/openstack-panko-api:latest", 
        "2018-05-09 07:47:44,557 ERROR: 46774 -- Failed to pull image: registry.access.redhat.com/rhosp13/openstack-panko-api:latest", 
        "2018-05-09 07:47:44,557 DEBUG: 46774 -- Trying to pull repository registry.access.redhat.com/rhosp13/openstack-panko-api ... ", 
        "2018-05-09 07:47:44,557 DEBUG: 46774 -- error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: \"File not found.\\\"\"", 
        "2018-05-09 07:47:44,561 DEBUG: 46774 -- NET_HOST enabled", 
        "2018-05-09 07:47:44,561 DEBUG: 46774 -- Running docker command: /usr/bin/docker run --user root --name docker-puppet-panko --env PUPPET_TAGS=file,file_line,concat,augeas,cron,panko_api_paste_ini,panko_config --env NAME=panko --en
v HOSTNAME=overcloud-controller-0 --env NO_ARCHIVE= --env STEP=6 --volume /etc/localtime:/etc/localtime:ro --volume /tmp/tmp_lnTRQ:/etc/config.pp:ro,z --volume /etc/puppet/:/tmp/puppet-etc/:ro,z --volume /usr/share/openstack-puppet/module
s/:/usr/share/openstack-puppet/modules/:ro,z --volume /var/lib/config-data:/var/lib/config-data/:z --volume tripleo_logs:/var/log/tripleo/ --volume /dev/log:/dev/log --volume /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --vo
lume /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume /var/lib/
docker-puppet/docker-puppet.sh:/var/lib/docker-puppet/docker-puppet.sh:z --entrypoint /var/lib/docker-puppet/docker-puppet.sh --net host --volume /etc/hosts:/etc/hosts:ro registry.access.redhat.com/rhosp13/openstack-panko-api:latest", 
        "2018-05-09 07:47:45,308 ERROR: 46774 -- Failed running docker-puppet.py for panko", 
        "2018-05-09 07:47:45,308 ERROR: 46774 -- Unable to find image 'registry.access.redhat.com/rhosp13/openstack-panko-api:latest' locally", 
        "Trying to pull repository registry.access.redhat.com/rhosp13/openstack-panko-api ... ", 
        "2018-05-09 07:47:45,308 INFO: 46774 -- Finished processing puppet configs for panko",

It tries to pull the image from registry.access.redhat.com/rhosp13, that actually does not exist. And it should be trying to pull them locally.

But the issue, is that, as i deploy without telemetry, the initial docker-images.yaml i create, don't contain any entries for panko, ceilometer, etc... When looking at docker-puppet.json in the controller, for example i can see:

[{"puppet_tags": "aodh_api_paste_ini,aodh_config", "config_volume": "aodh", "step_config": "include tripleo::profile::base::aodh::api\n\ninclude ::tripleo::profile::base::database::mysql::client", "config_image": "registry.access.redhat.com/rhosp13/openstack-aodh-api:latest"}, 

But for other entries such as nova i see:

 {"puppet_tags": "swift_config,swift_container_config,swift_container_sync_realms_config,swift_account_config,swift_object_config,swift_object_expirer_config,rsync::server", "config_volume": "swift", "step_config": "include ::tripleo::profile::base::swift::storage\n\nclass xinetd() {}", "config_image": ""}]

Pointing to internal registry, that should be the right thing.

So either all the components are included on the docker-images.yaml file properly, or entries for docker-puppet.json, for optional components such as telemetry, don't need to be added there.

Comment 1 Yolanda Robla 2018-05-09 08:18:06 UTC
Created attachment 1433610 [details]

Comment 2 Yolanda Robla 2018-05-09 08:19:23 UTC
Created attachment 1433612 [details]

Comment 3 Yolanda Robla 2018-05-09 08:25:06 UTC
My env file contains those settings, that disables the exact components that are failing when getting docker image:

    OS::TripleO::Controller::Net::SoftwareConfig: nic-configs/controller.yaml
    OS::TripleO::Compute::Net::SoftwareConfig: nic-configs/compute.yaml
    OS::TripleO::CephStorage::Net::SoftwareConfig: nic-configs/ceph-storage.yaml
    OS::TripleO::NodeUserData: first-boot.yaml
    OS::TripleO::NodeExtraConfigPost: post-install.yaml
    OS::TripleO::Services::CeilometerApi: OS::Heat::None
    OS::TripleO::Services::CeilometerCollector: OS::Heat::None
    OS::TripleO::Services::CeilometerExpirer: OS::Heat::None
    OS::TripleO::Services::CeilometerAgentCentral: OS::Heat::None
    OS::TripleO::Services::CeilometerAgentNotification: OS::Heat::None
    OS::TripleO::Services::CeilometerAgentIpmi: OS::Heat::None
    OS::TripleO::Services::ComputeCeilometerAgent: OS::Heat::None
    OS::TripleO::Services::GnocchiApi: OS::Heat::None
    OS::TripleO::Services::GnocchiMetricd: OS::Heat::None
    OS::TripleO::Services::GnocchiStatsd: OS::Heat::None
    OS::TripleO::Services::AodhApi: OS::Heat::None
    OS::TripleO::Services::AodhEvaluator: OS::Heat::None
    OS::TripleO::Services::AodhNotifier: OS::Heat::None
    OS::TripleO::Services::AodhListener: OS::Heat::None
    OS::TripleO::Services::PankoApi: OS::Heat::None

Comment 4 Marios Andreou 2018-05-10 11:22:04 UTC
Hi Yolanda, taking this to try and triage. We may need to reach out to other folks (telemetry and/or deployment/containers possibly not sure yet) so can you please clarify when this is happening. From comment #0 it sounds like it happend during the "openstack overcloud upgrade run --nodes Controller"... i.e. after you run the ffwd-upgrade prepare, then the ffwd-upgrade run, and then the upgrade step which fails. 

Can you also please include the full command, especially all the -e env files you used on the ffwd-upgrade prepare (for example to check the ordering of the env files... where is the contents of comment #3 included).

The OSP10 deployment... were you also setting OS::Heat::None for ceilo there? Or were you using something like https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/disabled/ceilometer-api-disabled.yaml Is Ceilometer in your roles_data.yaml (wondering if we remove it from there whether ths issue goes away). 

I can't see where the image would be 'defaulted' as it is in your env... and whatsmore why/how it defaults to registry.access.redhat.com
maybe containers dfg might know more. I'll bring this to scrum and can reach out to them later .

Comment 5 Yolanda Robla 2018-05-10 12:04:27 UTC
The problem seemed to be when you add docker.yaml env file after my own environment files. Seems that then, the resources that i disabled, are enabled again. I am not hitting that if i use the cli commands, but was hitting it with older versions of tripleo-upgrade,that were setting that docker and docker-ha yaml files.

Comment 6 Marios Andreou 2018-05-11 10:53:09 UTC
OK then wdyt should we close this one?

Comment 7 Yolanda Robla 2018-05-11 12:52:25 UTC
It was caused by some problem with tripleo-upgrade, this has been fixed now

Note You need to log in before you can comment on or make changes to this bug.