Description of problem: After upgrading overcloud to containerized services using overcloud deploy .... -e ~/containers-default-parameters.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml on compute node docker service in dead state. Version-Release number of selected component (if applicable): openstack Pike How reproducible: Steps to Reproduce: 1 Deploy undercloud + 1 controller + 1 compute 1.1) wget https://raw.githubusercontent.com/openstack/tripleo-quickstart/master/quickstart.sh 1.2) bash quickstart.sh --install-deps 1.3) bash quickstart.sh --working-dir /var/tmp/foo --teardown all --tags all --release master-tripleo-ci $HOST 2) grab overcloud deployment command from overcloud_deploy.log openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --block-storage-flavor oooq_blockstorage --swift-storage-flavor oooq_objectstorage --timeout 90 -e /home/stack/cloud-names.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /home/stack/enable-tls.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml -e /home/stack/inject-trust-anchor.yaml --validation-warnings-fatal --ntp-server pool.ntp.org 3) on undercloud node: 3.1) sudo chown :stack /var/run/docker.sock 3.2) # download container images openstack overcloud container image upload --verbose --config-file /usr/share/tripleo-common/contrib/overcloud_containers.yaml. 3.2.1) Check docker images on local docker registry using "docker images" 3.3) # create an envrionment file to make overcloud fetch the images from the undercloud # (192.168.24.1 is undercloud IP that must be pingable from the overcloud) echo > ~/containers-default-parameters.yaml 'parameter_defaults: DockerNamespace: 192.168.24.1:8787/tripleoupstream DockerNamespaceIsRegistry: true ' 3.4) Run upgrading overcloud to containerized services openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --block-storage-flavor oooq_blockstorage --swift-storage-flavor oooq_objectstorage --timeout 90 -e /home/stack/cloud-names.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /home/stack/enable-tls.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml -e /home/stack/inject-trust-anchor.yaml --validation-warnings-fatal --ntp-server pool.ntp.org -e ~/containers-default-parameters.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml 3.5) Check docker service, images, service containers on compute and controller node 3.6) Run tempest smoke suite Actual results: Docker service on compute node was dead. Expected results: All services moved to docker containers, tempest test passed. Additional info: Undercloud related info http://pastebin.test.redhat.com/472515 Controller related info http://pastebin.test.redhat.com/472516 Compute related info http://pastebin.test.redhat.com/472518
it might be that the compute-nodes are not being upgraded entirely, and therefore the services are not switching from BM to containers. It might eventually be under the Upgrade:DFG to deal with.
Indeed the default roles_data excludes computes from the main upgrade step. https://github.com/openstack/tripleo-heat-templates/blob/24a5fd643919bd3197d1ccc7f70273a9a70511e9/roles_data.yaml#L143 Excluding compute from the main step is probably correct and we should implement the compute part of the upgrade as a separate part of the workflow.
Moving this to the Upgrades DFG.
This was reported back in April when i was prototyping the upgrade to containerized deployments, and the compute upgrade (via upgrade-non-controller.sh) wasn't done at all, so the computes just didn't upgrade. I think with the way Upgrades DFG has been progressing on the upgrades implementation, the compute node upgrades should now be working via upgrade-non-controller.sh, including enablement of docker service on computes. Most likely this doesn't need any action on dev side and we can just retest.
After upgrade: [root@compute-0 heat-admin]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b9f97c08326f rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-cron-docker:20171103.1 "kolla_start" 7 minutes ago Up 7 minutes logrotate_crond 14d575d1d464 rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-nova-compute-docker:20171103.1 "kolla_start" 7 minutes ago Up 7 minutes (unhealthy) nova_migration_target 2792baedf241 rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-nova-compute-docker:20171103.1 "kolla_start" 7 minutes ago Up 7 minutes (healthy) nova_compute bfb10f54c32a rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-nova-libvirt-docker:20171103.1 "kolla_start" 10 minutes ago Up 10 minutes nova_libvirt d35b17844688 rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-nova-libvirt-docker:20171103.1 "kolla_start" 10 minutes ago Up 10 minutes nova_virtlogd 7188ed13743e rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-ceilometer-compute-docker:20171103.1 "kolla_start" 33 minutes ago Up 32 minutes ceilometer_agent_compute
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3457