Bug 1523707 - [UPDATES] PCS managed containers ain't restarted with latest images [NEEDINFO]
Summary: [UPDATES] PCS managed containers ain't restarted with latest images
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
Target Milestone: z2
: 12.0 (Pike)
Assignee: mathieu bultel
QA Contact: Yurii Prokulevych
Depends On:
TreeView+ depends on / blocked
Reported: 2017-12-08 16:16 UTC by Yurii Prokulevych
Modified: 2018-03-28 17:15 UTC (History)
22 users (show)

Fixed In Version: openstack-tripleo-heat-templates-7.0.3-24.el7ost python-tripleoclient-7.3.3-8.el7ost openstack-tripleo-common-7.6.9-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2018-03-28 17:14:53 UTC
Target Upstream Version:
dnavale: needinfo? (mbultel)

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:0602 None None None 2018-03-28 17:15:58 UTC
OpenStack gerrit 527086 None master: MERGED tripleo-heat-templates: Search for containers within stopped containers. (If38a4f7e25d4d1f4679d9684ad2c0db8475d679b) 2018-01-16 14:59:01 UTC
OpenStack gerrit 527692 None master: MERGED python-tripleoclient: Push the config container outside of heat stack update (Ib101c3c65573fa75182ac81d956161ceeb0422a9) 2018-01-16 14:58:54 UTC
OpenStack gerrit 527699 None master: MERGED tripleo-common: Get the config for update outside of package update (I8a92e4f4cfe8e3567e71f9ab60b4aef4142c3874) 2018-01-16 14:58:47 UTC
Launchpad 1738142 None None None 2017-12-14 08:33:43 UTC

Description Yurii Prokulevych 2017-12-08 16:16:06 UTC
Description of problem:
After minor update finishes on nodes running pcs services (galera/redis/haproxy)
those containers ain't restared with latest images.

[root@controller-2 ~]# docker images | grep haproxy                  12.0-20171201.1     3ad3a1214956        6 days ago          781 MB                  pcmklatest          3ad3a1214956        6 days ago          781 MB                  12.0-20171129.1     2a9767dddf79        8 days ago          774.4 MB

[root@controller-2 ~]# docker ps | grep -v 12.0-20171201.1
    CONTAINER ID        IMAGE                                                                         COMMAND                  CREATED             STATUS                          PORTS               NAMES
    a8f4bf26dba1        2a9767dddf79                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes                                       haproxy-bundle-docker-2
    2fc167ecda48        03bca6ccbf7f                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes                                       redis-bundle-docker-2
    7097c4bcc0d8        6d6f0bc78831                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes                                       galera-bundle-docker-2
    2eceb3d172cc        716b358a3921                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes (healthy)                             rabbitmq-bundle-docker-2
    ed6200f0c6a7        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:2.4-4              "/entrypoint.sh"         About an hour ago   Up About an hour                                    ceph-mon-controller-2

[root@controller-2 ~]# docker images | grep redis                   12.0-20171201.1     d0f7ca7536a3        6 days ago          780.7 MB                   pcmklatest          d0f7ca7536a3        6 days ago          780.7 MB                   12.0-20171129.1     03bca6ccbf7f        8 days ago          774.1 MB

[root@controller-2 ~]# docker images | grep rabbit                12.0-20171201.1     1c544a8ea1af        6 days ago          815.4 MB                pcmklatest          1c544a8ea1af        6 days ago          815.4 MB                12.0-20171129.1     716b358a3921        8 days ago          808.8 MB

[root@controller-2 ~]# docker images | grep mari                 12.0-20171201.1     491a0fe8d922        6 days ago          911.5 MB                 pcmklatest          491a0fe8d922        6 days ago          911.5 MB                 12.0-20171129.1     6d6f0bc78831        8 days ago          904.9 MB

From update log we can see that docker run uses correct image:

 u'        "Digest: sha256:cc2420e3dd8d989d0f86dd7dd3912d37d921fb4e0b1376889fbfb42b1b2b66c7", ',
 u'        "2017-12-08 11:10:17,607 DEBUG: 393285 -- NET_HOST enabled", ',
 u'        "2017-12-08 11:10:17,608 DEBUG: 393285 -- Running docker command: /usr/bin/docker run --user root --name docker-puppet-haproxy --health-cmd /bin/true --env PUPPET_TAGS=file,file_line,concat,augeas,cron,haproxy_config --env NAME=haproxy --env HOSTNAME=controller-0 --env NO_ARCHIVE= --env STEP=6 --volume /tmp/tmpRcXUdO:/etc/config.pp:ro --volume /etc/puppet/:/tmp/puppet-etc/:ro --volume /usr/share/openstack-puppet/modules/:/usr/share/openstack-puppet/modules/:ro --volume /var/lib/config-data:/var/lib/config-data/:rw --volume tripleo_logs:/var/log/tripleo/ --volume /dev/log:/dev/log --volume /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume /var/lib/docker-puppet/docker-puppet.sh:/var/lib/docker-puppet/docker-puppet.sh:rw --volume /etc/ipa/ca.crt:/etc/ipa/ca.crt:ro --volume /etc/pki/tls/private/haproxy:/etc/pki/tls/private/haproxy:ro --volume /etc/pki/tls/certs/haproxy:/etc/pki/tls/certs/haproxy:ro --volume /etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro --entrypoint /var/lib/docker-puppet/docker-puppet.sh --net host --volume /etc/hosts:/etc/hosts:ro", ',
 u'        "2017-12-08 11:10:18,032 DEBUG: 393286 -- Trying to pull repository ... ", ',

Version-Release number of selected component (if applicable):

Steps to Reproduce:
1. Update uc to 2017-12-01.4
2. Setup latest repos on oc
3. Run init-minor-update to setup heat output
4. Run minor update of oc nodes
5. Upload latest docker images to uc registry
6. Generate file with images
7. Run init-minor-update to setup heat output
8. Run minor update against nodes hosting pcs services

Actual results:
Latest images are downloaded to nodes, correctly retagged for pcs.
PCS managed containers are stared with previous images.

Expected results:
PCS managed services are restarted with latest images.

Additional info:
Virtual setup: 3controllers + 2computes + 3ceph
Re-run of update command restarts services with correct images.

Comment 2 Sofer Athlan-Guyot 2017-12-11 12:15:03 UTC

so in the ansible run we can see (for haproxy for instance):

u'TASK [Get a list of container using Haproxy image] *****************************',
 u'skipping: []',
 u'TASK [Remove any container using the same Haproxy image] ***********************',
 u'skipping: []',
 u'TASK [Remove previous Haproxy images] ******************************************',
 u'skipping: []',
 u'TASK [Pull latest Haproxy images] **********************************************',
 u'skipping: []',
 u'TASK [Retag pcmklatest to latest Haproxy image] ********************************',
 u'skipping: []',

the crucial tasks are skipped.

Comment 3 Sofer Athlan-Guyot 2017-12-11 12:18:59 UTC
Previous comment has to be ignore, this is done later on.

Comment 4 Yurii Prokulevych 2017-12-11 14:05:10 UTC
So the problem seems to be that we stop pcs cluster at step 1 and search for pcs managed containers at step 2. Problem is that containers are stopped and we run 'docker ps -q -f ancestor=<image_id>', which by default show just running containers.

Comment 5 Michele Baldessari 2017-12-11 21:21:15 UTC
So Damien, Yurii and I spent some more time on this. We started from a clean environment and we could not reproduce the problem:
- Each controller did exactly as we expected it and updated to the latest pacemaker image

Tomorrow we will run some more tests. Right now the theory is that some additional steps need to happen for us to see the issue (maybe rerunning some steps like the minor-init-update or the config download multiple times). I think we need to fully understand the root cause before we look at throwing any patches at the problem.

Comment 7 mathieu bultel 2017-12-13 12:39:49 UTC
So the issue here is that the config container is updated before the heat stack update is finished, thats why the config doesn't get all the latest docker images.
The workaround would be to run --init-minor-update twice for GA only if we want to update the docker registry file. For 0 day or Z release, I have something that fix this wrong behavior.

Comment 8 mathieu bultel 2017-12-14 08:35:26 UTC
LP and master review attached

Comment 26 errata-xmlrpc 2018-03-28 17:14:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.