Bug 1576308

Summary: docker-puppet.json incorrectly generated for fast forward upgrades
Product: Red Hat OpenStack Reporter: Yolanda Robla <yroblamo>
Component: openstack-tripleoAssignee: Marios Andreou <mandreou>
Status: CLOSED NOTABUG QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: lbezdick, mandreou, mburns, mcornea, sathlang, yroblamo
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-11 12:52:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
docker-puppet.json
none
docker-images.yaml none

Description Yolanda Robla 2018-05-09 08:15:36 UTC
Description of problem:

I deployed OSP 10 initially, with telemetry disabled, and I'm trying to execute fast forward upgrade. When i do it, the deployment steps fail, because they cannot find the images for ceilometer, panko, gnocchi:

        "2018-05-09 07:47:45,309 ERROR: 46773 -- ERROR configuring aodh", 
        "2018-05-09 07:47:45,310 ERROR: 46773 -- ERROR configuring gnocchi", 
        "2018-05-09 07:47:45,310 ERROR: 46773 -- ERROR configuring panko", 
        "2018-05-09 07:47:45,310 ERROR: 46773 -- ERROR configuring ceilometer",

Looking at the details for each one, i can see:

        "2018-05-09 07:47:44,556 WARNING: 46774 -- docker pull failed: error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: \"File not found.\\\"\"", 
        "2018-05-09 07:47:44,557 WARNING: 46774 -- retrying pulling image: registry.access.redhat.com/rhosp13/openstack-panko-api:latest", 
        "2018-05-09 07:47:44,557 ERROR: 46774 -- Failed to pull image: registry.access.redhat.com/rhosp13/openstack-panko-api:latest", 
        "2018-05-09 07:47:44,557 DEBUG: 46774 -- Trying to pull repository registry.access.redhat.com/rhosp13/openstack-panko-api ... ", 
        "2018-05-09 07:47:44,557 DEBUG: 46774 -- error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: \"File not found.\\\"\"", 
        "2018-05-09 07:47:44,561 DEBUG: 46774 -- NET_HOST enabled", 
        "2018-05-09 07:47:44,561 DEBUG: 46774 -- Running docker command: /usr/bin/docker run --user root --name docker-puppet-panko --env PUPPET_TAGS=file,file_line,concat,augeas,cron,panko_api_paste_ini,panko_config --env NAME=panko --en
v HOSTNAME=overcloud-controller-0 --env NO_ARCHIVE= --env STEP=6 --volume /etc/localtime:/etc/localtime:ro --volume /tmp/tmp_lnTRQ:/etc/config.pp:ro,z --volume /etc/puppet/:/tmp/puppet-etc/:ro,z --volume /usr/share/openstack-puppet/module
s/:/usr/share/openstack-puppet/modules/:ro,z --volume /var/lib/config-data:/var/lib/config-data/:z --volume tripleo_logs:/var/log/tripleo/ --volume /dev/log:/dev/log --volume /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --vo
lume /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume /var/lib/
docker-puppet/docker-puppet.sh:/var/lib/docker-puppet/docker-puppet.sh:z --entrypoint /var/lib/docker-puppet/docker-puppet.sh --net host --volume /etc/hosts:/etc/hosts:ro registry.access.redhat.com/rhosp13/openstack-panko-api:latest", 
        "2018-05-09 07:47:45,308 ERROR: 46774 -- Failed running docker-puppet.py for panko", 
        "2018-05-09 07:47:45,308 ERROR: 46774 -- Unable to find image 'registry.access.redhat.com/rhosp13/openstack-panko-api:latest' locally", 
        "Trying to pull repository registry.access.redhat.com/rhosp13/openstack-panko-api ... ", 
        "2018-05-09 07:47:45,308 INFO: 46774 -- Finished processing puppet configs for panko",

It tries to pull the image from registry.access.redhat.com/rhosp13, that actually does not exist. And it should be trying to pull them locally.

But the issue, is that, as i deploy without telemetry, the initial docker-images.yaml i create, don't contain any entries for panko, ceilometer, etc... When looking at docker-puppet.json in the controller, for example i can see:

[{"puppet_tags": "aodh_api_paste_ini,aodh_config", "config_volume": "aodh", "step_config": "include tripleo::profile::base::aodh::api\n\ninclude ::tripleo::profile::base::database::mysql::client", "config_image": "registry.access.redhat.com/rhosp13/openstack-aodh-api:latest"}, 

But for other entries such as nova i see:

 {"puppet_tags": "swift_config,swift_container_config,swift_container_sync_realms_config,swift_account_config,swift_object_config,swift_object_expirer_config,rsync::server", "config_volume": "swift", "step_config": "include ::tripleo::profile::base::swift::storage\n\nclass xinetd() {}", "config_image": "192.0.2.1:8787/rhosp13/openstack-swift-proxy-server:2018-05-07.2"}]

Pointing to internal registry, that should be the right thing.

So either all the components are included on the docker-images.yaml file properly, or entries for docker-puppet.json, for optional components such as telemetry, don't need to be added there.

Comment 1 Yolanda Robla 2018-05-09 08:18:06 UTC
Created attachment 1433610 [details]
docker-puppet.json

Comment 2 Yolanda Robla 2018-05-09 08:19:23 UTC
Created attachment 1433612 [details]
docker-images.yaml

Comment 3 Yolanda Robla 2018-05-09 08:25:06 UTC
My env file contains those settings, that disables the exact components that are failing when getting docker image:

resource_registry:
    OS::TripleO::Controller::Net::SoftwareConfig: nic-configs/controller.yaml
    OS::TripleO::Compute::Net::SoftwareConfig: nic-configs/compute.yaml
    OS::TripleO::CephStorage::Net::SoftwareConfig: nic-configs/ceph-storage.yaml
    OS::TripleO::NodeUserData: first-boot.yaml
    OS::TripleO::NodeExtraConfigPost: post-install.yaml
    OS::TripleO::Services::CeilometerApi: OS::Heat::None
    OS::TripleO::Services::CeilometerCollector: OS::Heat::None
    OS::TripleO::Services::CeilometerExpirer: OS::Heat::None
    OS::TripleO::Services::CeilometerAgentCentral: OS::Heat::None
    OS::TripleO::Services::CeilometerAgentNotification: OS::Heat::None
    OS::TripleO::Services::CeilometerAgentIpmi: OS::Heat::None
    OS::TripleO::Services::ComputeCeilometerAgent: OS::Heat::None
    OS::TripleO::Services::GnocchiApi: OS::Heat::None
    OS::TripleO::Services::GnocchiMetricd: OS::Heat::None
    OS::TripleO::Services::GnocchiStatsd: OS::Heat::None
    OS::TripleO::Services::AodhApi: OS::Heat::None
    OS::TripleO::Services::AodhEvaluator: OS::Heat::None
    OS::TripleO::Services::AodhNotifier: OS::Heat::None
    OS::TripleO::Services::AodhListener: OS::Heat::None
    OS::TripleO::Services::PankoApi: OS::Heat::None

Comment 4 Marios Andreou 2018-05-10 11:22:04 UTC
Hi Yolanda, taking this to try and triage. We may need to reach out to other folks (telemetry and/or deployment/containers possibly not sure yet) so can you please clarify when this is happening. From comment #0 it sounds like it happend during the "openstack overcloud upgrade run --nodes Controller"... i.e. after you run the ffwd-upgrade prepare, then the ffwd-upgrade run, and then the upgrade step which fails. 

Can you also please include the full command, especially all the -e env files you used on the ffwd-upgrade prepare (for example to check the ordering of the env files... where is the contents of comment #3 included).

The OSP10 deployment... were you also setting OS::Heat::None for ceilo there? Or were you using something like https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/disabled/ceilometer-api-disabled.yaml Is Ceilometer in your roles_data.yaml (wondering if we remove it from there whether ths issue goes away). 

I can't see where the image would be 'defaulted' as it is in your env... and whatsmore why/how it defaults to registry.access.redhat.com
maybe containers dfg might know more. I'll bring this to scrum and can reach out to them later .

Comment 5 Yolanda Robla 2018-05-10 12:04:27 UTC
The problem seemed to be when you add docker.yaml env file after my own environment files. Seems that then, the resources that i disabled, are enabled again. I am not hitting that if i use the cli commands, but was hitting it with older versions of tripleo-upgrade,that were setting that docker and docker-ha yaml files.

Comment 6 Marios Andreou 2018-05-11 10:53:09 UTC
OK then wdyt should we close this one?

Comment 7 Yolanda Robla 2018-05-11 12:52:25 UTC
It was caused by some problem with tripleo-upgrade, this has been fixed now