Bug 1523707

Summary:	[UPDATES] PCS managed containers ain't restarted with latest images
Product:	Red Hat OpenStack	Reporter:	Yurii Prokulevych <yprokule>
Component:	openstack-tripleo-heat-templates	Assignee:	mathieu bultel <mbultel>
Status:	CLOSED ERRATA	QA Contact:	Yurii Prokulevych <yprokule>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	12.0 (Pike)	CC:	achernet, amachluf, aopincar, aschultz, atalmor, augol, chjones, dbecker, dciabrin, dnavale, lbezdick, maandre, mbracho, mbultel, mburns, michele, morazi, rhel-osp-director-maint, sasha, sathlang, skatlapa, tvignaud
Target Milestone:	z2	Keywords:	Regression, Triaged, ZStream
Target Release:	12.0 (Pike)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-tripleo-heat-templates-7.0.3-24.el7ost python-tripleoclient-7.3.3-8.el7ost openstack-tripleo-common-7.6.9-1.el7ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-03-28 17:14:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yurii Prokulevych 2017-12-08 16:16:06 UTC

Description of problem:
-----------------------
After minor update finishes on nodes running pcs services (galera/redis/haproxy)
those containers ain't restared with latest images.

[root@controller-2 ~]# docker images | grep haproxy
192.168.24.1:8787/rhosp12/openstack-haproxy                  12.0-20171201.1     3ad3a1214956        6 days ago          781 MB
192.168.24.1:8787/rhosp12/openstack-haproxy                  pcmklatest          3ad3a1214956        6 days ago          781 MB
192.168.24.1:8787/rhosp12/openstack-haproxy                  12.0-20171129.1     2a9767dddf79        8 days ago          774.4 MB

[root@controller-2 ~]# docker ps | grep -v 12.0-20171201.1
    CONTAINER ID        IMAGE                                                                         COMMAND                  CREATED             STATUS                          PORTS               NAMES
    a8f4bf26dba1        2a9767dddf79                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes                                       haproxy-bundle-docker-2
    2fc167ecda48        03bca6ccbf7f                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes                                       redis-bundle-docker-2
    7097c4bcc0d8        6d6f0bc78831                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes                                       galera-bundle-docker-2
    2eceb3d172cc        716b358a3921                                                                  "/bin/bash /usr/local"   36 minutes ago      Up 36 minutes (healthy)                             rabbitmq-bundle-docker-2
    ed6200f0c6a7        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:2.4-4              "/entrypoint.sh"         About an hour ago   Up About an hour                                    ceph-mon-controller-2

[root@controller-2 ~]# docker images | grep redis
192.168.24.1:8787/rhosp12/openstack-redis                   12.0-20171201.1     d0f7ca7536a3        6 days ago          780.7 MB
192.168.24.1:8787/rhosp12/openstack-redis                   pcmklatest          d0f7ca7536a3        6 days ago          780.7 MB
192.168.24.1:8787/rhosp12/openstack-redis                   12.0-20171129.1     03bca6ccbf7f        8 days ago          774.1 MB

[root@controller-2 ~]# docker images | grep rabbit
192.168.24.1:8787/rhosp12/openstack-rabbitmq                12.0-20171201.1     1c544a8ea1af        6 days ago          815.4 MB
192.168.24.1:8787/rhosp12/openstack-rabbitmq                pcmklatest          1c544a8ea1af        6 days ago          815.4 MB
192.168.24.1:8787/rhosp12/openstack-rabbitmq                12.0-20171129.1     716b358a3921        8 days ago          808.8 MB

[root@controller-2 ~]# docker images | grep mari
192.168.24.1:8787/rhosp12/openstack-mariadb                 12.0-20171201.1     491a0fe8d922        6 days ago          911.5 MB
192.168.24.1:8787/rhosp12/openstack-mariadb                 pcmklatest          491a0fe8d922        6 days ago          911.5 MB
192.168.24.1:8787/rhosp12/openstack-mariadb                 12.0-20171129.1     6d6f0bc78831        8 days ago          904.9 MB

From update log we can see that docker run uses correct image:
--------------------------------------------------------------

 u'        "Digest: sha256:cc2420e3dd8d989d0f86dd7dd3912d37d921fb4e0b1376889fbfb42b1b2b66c7", ',
 u'        "2017-12-08 11:10:17,607 DEBUG: 393285 -- NET_HOST enabled", ',
 u'        "2017-12-08 11:10:17,608 DEBUG: 393285 -- Running docker command: /usr/bin/docker run --user root --name docker-puppet-haproxy --health-cmd /bin/true --env PUPPET_TAGS=file,file_line,concat,augeas,cron,haproxy_config --env NAME=haproxy --env HOSTNAME=controller-0 --env NO_ARCHIVE= --env STEP=6 --volume /tmp/tmpRcXUdO:/etc/config.pp:ro --volume /etc/puppet/:/tmp/puppet-etc/:ro --volume /usr/share/openstack-puppet/modules/:/usr/share/openstack-puppet/modules/:ro --volume /var/lib/config-data:/var/lib/config-data/:rw --volume tripleo_logs:/var/log/tripleo/ --volume /dev/log:/dev/log --volume /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume /var/lib/docker-puppet/docker-puppet.sh:/var/lib/docker-puppet/docker-puppet.sh:rw --volume /etc/ipa/ca.crt:/etc/ipa/ca.crt:ro --volume /etc/pki/tls/private/haproxy:/etc/pki/tls/private/haproxy:ro --volume /etc/pki/tls/certs/haproxy:/etc/pki/tls/certs/haproxy:ro --volume /etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro --entrypoint /var/lib/docker-puppet/docker-puppet.sh --net host --volume /etc/hosts:/etc/hosts:ro 192.168.24.1:8787/rhosp12/openstack-haproxy:12.0-20171201.1", ',
 u'        "2017-12-08 11:10:18,032 DEBUG: 393286 -- Trying to pull repository 192.168.24.1:8787/rhosp12/openstack-ceilometer-central ... ", ',


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-common-7.6.3-8.el7ost.noarch
openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch
openstack-tripleo-common-containers-7.6.3-8.el7ost.noarch
openstack-tripleo-ui-7.4.3-4.el7ost.noarch
openstack-tripleo-validations-7.4.2-1.el7ost.noarch
puppet-tripleo-7.4.3-11.el7ost.noarch
openstack-tripleo-heat-templates-7.0.3-18.el7ost.noarch
openstack-tripleo-puppet-elements-7.0.1-2.el7ost.noarch
python-tripleoclient-7.3.3-7.el7ost.noarch                                                       
pcs-0.9.158-6.el7_4.1.x86_64                                                                                                                                                                   
puppet-pacemaker-0.6.0-2.el7ost.noarch                                                                                                                                                         
pacemaker-cli-1.1.16-12.el7_4.5.x86_64                                                                                                                                                         
ansible-pacemaker-1.0.3-2.el7ost.noarch                                                                                                                                                        
pacemaker-1.1.16-12.el7_4.5.x86_64                                                                                                                                                             
userspace-rcu-0.7.16-1.el7cp.x86_64                                                                                                                                                            
pacemaker-remote-1.1.16-12.el7_4.5.x86_64                                                                                                                                                      
pacemaker-libs-1.1.16-12.el7_4.5.x86_64                                                                                                                                                        
pacemaker-cluster-libs-1.1.16-12.el7_4.5.x86_64                                                                                                                                                                                                            
docker-rhel-push-plugin-1.12.6-68.gitec8512b.el7.x86_64                                                                                                                                        
python-docker-pycreds-1.10.6-3.el7.noarch                                                                                                                                                      
docker-common-1.12.6-68.gitec8512b.el7.x86_64                                                                                                                                                  
python-heat-agent-docker-cmd-1.4.0-1.el7ost.noarch                                                                                                                                             
docker-client-1.12.6-68.gitec8512b.el7.x86_64                                                                                                                                                  
docker-1.12.6-68.gitec8512b.el7.x86_64                                                                                                                                                         
python-docker-py-1.10.6-3.el7.noarch

Steps to Reproduce:
-------------------
1. Update uc to 2017-12-01.4
2. Setup latest repos on oc
3. Run init-minor-update to setup heat output
4. Run minor update of oc nodes
5. Upload latest docker images to uc registry
6. Generate file with images
7. Run init-minor-update to setup heat output
8. Run minor update against nodes hosting pcs services


Actual results:
---------------
Latest images are downloaded to nodes, correctly retagged for pcs.
PCS managed containers are stared with previous images.


Expected results:
-----------------
PCS managed services are restarted with latest images.

Additional info:
----------------
Virtual setup: 3controllers + 2computes + 3ceph
Re-run of update command restarts services with correct images.

Comment 2 Sofer Athlan-Guyot 2017-12-11 12:15:03 UTC

Hi,

so in the ansible run we can see (for haproxy for instance):

u'TASK [Get a list of container using Haproxy image] *****************************',
 u'skipping: [192.168.24.20]',
 u'',
 u'TASK [Remove any container using the same Haproxy image] ***********************',
 u'skipping: [192.168.24.20]',
 u'',
 u'TASK [Remove previous Haproxy images] ******************************************',
 u'skipping: [192.168.24.20]',
 u'',
 u'TASK [Pull latest Haproxy images] **********************************************',
 u'skipping: [192.168.24.20]',
 u'',
 u'TASK [Retag pcmklatest to latest Haproxy image] ********************************',
 u'skipping: [192.168.24.20]',
 u'',

the crucial tasks are skipped.

Comment 3 Sofer Athlan-Guyot 2017-12-11 12:18:59 UTC

Previous comment has to be ignore, this is done later on.

Comment 4 Yurii Prokulevych 2017-12-11 14:05:10 UTC

So the problem seems to be that we stop pcs cluster at step 1 and search for pcs managed containers at step 2. Problem is that containers are stopped and we run 'docker ps -q -f ancestor=<image_id>', which by default show just running containers.

Comment 5 Michele Baldessari 2017-12-11 21:21:15 UTC

So Damien, Yurii and I spent some more time on this. We started from a clean environment and we could not reproduce the problem:
- Each controller did exactly as we expected it and updated to the latest pacemaker image

Tomorrow we will run some more tests. Right now the theory is that some additional steps need to happen for us to see the issue (maybe rerunning some steps like the minor-init-update or the config download multiple times). I think we need to fully understand the root cause before we look at throwing any patches at the problem.

Comment 7 mathieu bultel 2017-12-13 12:39:49 UTC

So the issue here is that the config container is updated before the heat stack update is finished, thats why the config doesn't get all the latest docker images.
The workaround would be to run --init-minor-update twice for GA only if we want to update the docker registry file. For 0 day or Z release, I have something that fix this wrong behavior.

Comment 8 mathieu bultel 2017-12-14 08:35:26 UTC

LP and master review attached

Comment 26 errata-xmlrpc 2018-03-28 17:14:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0602

Comment 28 Red Hat Bugzilla 2023-09-15 01:26:37 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days