1523707 – [UPDATES] PCS managed containers ain't restarted with latest images

Bug 1523707 - [UPDATES] PCS managed containers ain't restarted with latest images

Summary: [UPDATES] PCS managed containers ain't restarted with latest images

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	12.0 (Pike)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	z2
Target Release:	12.0 (Pike)
Assignee:	mathieu bultel
QA Contact:	Yurii Prokulevych
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-12-08 16:16 UTC by Yurii Prokulevych
Modified:	2023-09-15 01:26 UTC (History)
CC List:	22 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-7.0.3-24.el7ost python-tripleoclient-7.3.3-8.el7ost openstack-tripleo-common-7.6.9-1.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-03-28 17:14:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1738142	None	None	None	2017-12-14 08:33:43 UTC
OpenStack gerrit	527086	None	master: MERGED	tripleo-heat-templates: Search for containers within stopped containers. (If38a4f7e25d4d1f4679d9684ad2c0db8475d679b)	2018-01-16 14:59:01 UTC
OpenStack gerrit	527692	None	master: MERGED	python-tripleoclient: Push the config container outside of heat stack update (Ib101c3c65573fa75182ac81d956161ceeb0422a9)	2018-01-16 14:58:54 UTC
OpenStack gerrit	527699	None	master: MERGED	tripleo-common: Get the config for update outside of package update (I8a92e4f4cfe8e3567e71f9ab60b4aef4142c3874)	2018-01-16 14:58:47 UTC
Red Hat Issue Tracker	OSP-17013	None	None	None	2022-07-09 10:16:32 UTC
Red Hat Product Errata	RHSA-2018:0602	None	None	None	2018-03-28 17:15:58 UTC

Description Yurii Prokulevych 2017-12-08 16:16:06 UTC

Description of problem:
-----------------------
After minor update finishes on nodes running pcs services (galera/redis/haproxy)
those containers ain't restared with latest images.

[root@controller-2 ~]# docker images | grep haproxy
192.168.24.1:8787/rhosp12/openstack-haproxy                  12.0-20171201.1     3ad3a1214956        6 days ago          781 MB
192.168.24.1:8787/rhosp12/openstack-haproxy                  pcmklatest          3ad3a1214956        6 days ago          781 MB
192.168.24.1:8787/rhosp12/openstack-haproxy                  12.0-20171129.1     2a9767dddf79        8 days ago          774.4 MB

[root@controller-2 ~]# docker ps | grep -v 12.0-20171201.1
    CONTAINER ID        IMAGE                                                                         COMMAND                  CREATED             STATUS                          PORTS               NAMES
    a8f4bf26dba1        2a9767dddf79                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes                                       haproxy-bundle-docker-2
    2fc167ecda48        03bca6ccbf7f                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes                                       redis-bundle-docker-2
    7097c4bcc0d8        6d6f0bc78831                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes                                       galera-bundle-docker-2
    2eceb3d172cc        716b358a3921                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes (healthy)                             rabbitmq-bundle-docker-2
    ed6200f0c6a7        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:2.4-4              "/entrypoint.sh"         About an hour ago   Up About an hour                                    ceph-mon-controller-2

[root@controller-2 ~]# docker images | grep redis
192.168.24.1:8787/rhosp12/openstack-redis                   12.0-20171201.1     d0f7ca7536a3        6 days ago          780.7 MB
192.168.24.1:8787/rhosp12/openstack-redis                   pcmklatest          d0f7ca7536a3        6 days ago          780.7 MB
192.168.24.1:8787/rhosp12/openstack-redis                   12.0-20171129.1     03bca6ccbf7f        8 days ago          774.1 MB

[root@controller-2 ~]# docker images | grep rabbit
192.168.24.1:8787/rhosp12/openstack-rabbitmq                12.0-20171201.1     1c544a8ea1af        6 days ago          815.4 MB
192.168.24.1:8787/rhosp12/openstack-rabbitmq                pcmklatest          1c544a8ea1af        6 days ago          815.4 MB
192.168.24.1:8787/rhosp12/openstack-rabbitmq                12.0-20171129.1     716b358a3921        8 days ago          808.8 MB

[root@controller-2 ~]# docker images | grep mari
192.168.24.1:8787/rhosp12/openstack-mariadb                 12.0-20171201.1     491a0fe8d922        6 days ago          911.5 MB
192.168.24.1:8787/rhosp12/openstack-mariadb                 pcmklatest          491a0fe8d922        6 days ago          911.5 MB
192.168.24.1:8787/rhosp12/openstack-mariadb                 12.0-20171129.1     6d6f0bc78831        8 days ago          904.9 MB

From update log we can see that docker run uses correct image:
--------------------------------------------------------------

 u'        "Digest: sha256:cc2420e3dd8d989d0f86dd7dd3912d37d921fb4e0b1376889fbfb42b1b2b66c7", ',
 u'        "2017-12-08 11:10:17,607 DEBUG: 393285 -- NET_HOST enabled", ',
 u'        "2017-12-08 11:10:17,608 DEBUG: 393285 -- Running docker command: /usr/bin/docker run --user root --name docker-puppet-haproxy --health-cmd /bin/true --env PUPPET_TAGS=file,file_line,concat,augeas,cron,haproxy_config --env NAME=haproxy --env HOSTNAME=controller-0 --env NO_ARCHIVE= --env STEP=6 --volume /tmp/tmpRcXUdO:/etc/config.pp:ro --volume /etc/puppet/:/tmp/puppet-etc/:ro --volume /usr/share/openstack-puppet/modules/:/usr/share/openstack-puppet/modules/:ro --volume /var/lib/config-data:/var/lib/config-data/:rw --volume tripleo_logs:/var/log/tripleo/ --volume /dev/log:/dev/log --volume /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume /var/lib/docker-puppet/docker-puppet.sh:/var/lib/docker-puppet/docker-puppet.sh:rw --volume /etc/ipa/ca.crt:/etc/ipa/ca.crt:ro --volume /etc/pki/tls/private/haproxy:/etc/pki/tls/private/haproxy:ro --volume /etc/pki/tls/certs/haproxy:/etc/pki/tls/certs/haproxy:ro --volume /etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro --entrypoint /var/lib/docker-puppet/docker-puppet.sh --net host --volume /etc/hosts:/etc/hosts:ro 192.168.24.1:8787/rhosp12/openstack-haproxy:12.0-20171201.1", ',
 u'        "2017-12-08 11:10:18,032 DEBUG: 393286 -- Trying to pull repository 192.168.24.1:8787/rhosp12/openstack-ceilometer-central ... ", ',


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-common-7.6.3-8.el7ost.noarch
openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch
openstack-tripleo-common-containers-7.6.3-8.el7ost.noarch
openstack-tripleo-ui-7.4.3-4.el7ost.noarch
openstack-tripleo-validations-7.4.2-1.el7ost.noarch
puppet-tripleo-7.4.3-11.el7ost.noarch
openstack-tripleo-heat-templates-7.0.3-18.el7ost.noarch
openstack-tripleo-puppet-elements-7.0.1-2.el7ost.noarch
python-tripleoclient-7.3.3-7.el7ost.noarch                                                       
pcs-0.9.158-6.el7_4.1.x86_64                                                                                                                                                                   
puppet-pacemaker-0.6.0-2.el7ost.noarch                                                                                                                                                         
pacemaker-cli-1.1.16-12.el7_4.5.x86_64                                                                                                                                                         
ansible-pacemaker-1.0.3-2.el7ost.noarch                                                                                                                                                        
pacemaker-1.1.16-12.el7_4.5.x86_64                                                                                                                                                             
userspace-rcu-0.7.16-1.el7cp.x86_64                                                                                                                                                            
pacemaker-remote-1.1.16-12.el7_4.5.x86_64                                                                                                                                                      
pacemaker-libs-1.1.16-12.el7_4.5.x86_64                                                                                                                                                        
pacemaker-cluster-libs-1.1.16-12.el7_4.5.x86_64                                                                                                                                                                                                            
docker-rhel-push-plugin-1.12.6-68.gitec8512b.el7.x86_64                                                                                                                                        
python-docker-pycreds-1.10.6-3.el7.noarch                                                                                                                                                      
docker-common-1.12.6-68.gitec8512b.el7.x86_64                                                                                                                                                  
python-heat-agent-docker-cmd-1.4.0-1.el7ost.noarch                                                                                                                                             
docker-client-1.12.6-68.gitec8512b.el7.x86_64                                                                                                                                                  
docker-1.12.6-68.gitec8512b.el7.x86_64                                                                                                                                                         
python-docker-py-1.10.6-3.el7.noarch

Steps to Reproduce:
-------------------
1. Update uc to 2017-12-01.4
2. Setup latest repos on oc
3. Run init-minor-update to setup heat output
4. Run minor update of oc nodes
5. Upload latest docker images to uc registry
6. Generate file with images
7. Run init-minor-update to setup heat output
8. Run minor update against nodes hosting pcs services


Actual results:
---------------
Latest images are downloaded to nodes, correctly retagged for pcs.
PCS managed containers are stared with previous images.


Expected results:
-----------------
PCS managed services are restarted with latest images.

Additional info:
----------------
Virtual setup: 3controllers + 2computes + 3ceph
Re-run of update command restarts services with correct images.

Comment 2 Sofer Athlan-Guyot 2017-12-11 12:15:03 UTC

Hi,

so in the ansible run we can see (for haproxy for instance):

u'TASK [Get a list of container using Haproxy image] *****************************',
 u'skipping: [192.168.24.20]',
 u'',
 u'TASK [Remove any container using the same Haproxy image] ***********************',
 u'skipping: [192.168.24.20]',
 u'',
 u'TASK [Remove previous Haproxy images] ******************************************',
 u'skipping: [192.168.24.20]',
 u'',
 u'TASK [Pull latest Haproxy images] **********************************************',
 u'skipping: [192.168.24.20]',
 u'',
 u'TASK [Retag pcmklatest to latest Haproxy image] ********************************',
 u'skipping: [192.168.24.20]',
 u'',

the crucial tasks are skipped.

Comment 3 Sofer Athlan-Guyot 2017-12-11 12:18:59 UTC

Previous comment has to be ignore, this is done later on.

Comment 4 Yurii Prokulevych 2017-12-11 14:05:10 UTC

So the problem seems to be that we stop pcs cluster at step 1 and search for pcs managed containers at step 2. Problem is that containers are stopped and we run 'docker ps -q -f ancestor=<image_id>', which by default show just running containers.

Comment 5 Michele Baldessari 2017-12-11 21:21:15 UTC

So Damien, Yurii and I spent some more time on this. We started from a clean environment and we could not reproduce the problem:
- Each controller did exactly as we expected it and updated to the latest pacemaker image

Tomorrow we will run some more tests. Right now the theory is that some additional steps need to happen for us to see the issue (maybe rerunning some steps like the minor-init-update or the config download multiple times). I think we need to fully understand the root cause before we look at throwing any patches at the problem.

Comment 7 mathieu bultel 2017-12-13 12:39:49 UTC

So the issue here is that the config container is updated before the heat stack update is finished, thats why the config doesn't get all the latest docker images.
The workaround would be to run --init-minor-update twice for GA only if we want to update the docker registry file. For 0 day or Z release, I have something that fix this wrong behavior.

Comment 8 mathieu bultel 2017-12-14 08:35:26 UTC

LP and master review attached

Comment 26 errata-xmlrpc 2018-03-28 17:14:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0602

Comment 28 Red Hat Bugzilla 2023-09-15 01:26:37 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.

achernet
amachluf
aopincar
aschultz
atalmor
augol
chjones
dbecker
dciabrin
dnavale
lbezdick
maandre
mbracho
mbultel
mburns
michele
morazi
rhel-osp-director-maint
sasha
sathlang
skatlapa
tvignaud