Bug 1560731 - Adding a bind-mount to a bundle doesn't restart the associated container
Summary: Adding a bind-mount to a bundle doesn't restart the associated container
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Andrew Beekhof
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-26 20:59 UTC by Damien Ciabrini
Modified: 2018-03-27 08:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-27 08:22:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
crm_report before and after bind-mount change (1.03 MB, application/x-bzip)
2018-03-26 20:59 UTC, Damien Ciabrini
no flags Details

Description Damien Ciabrini 2018-03-26 20:59:12 UTC
Created attachment 1413351 [details]
crm_report before and after bind-mount change

Description of problem:
When adding a bind mount in a bundle, pacemaker does not react to the configuration change and does not restart automatically the associated container.

Version-Release number of selected component (if applicable):
1.1.18-11.el7-2b07d5c5a9

How reproducible:
Always


Steps to Reproduce:
1. create a bundle in the cluster (e.g. start from an OSP 13 deployment)
2. change the bind-mount configuration:
pcs resource bundle update galera-bundle storage-map add id=mysql-foo source-dir=/foo target-dir=/foo options=rw


Actual results:
the container started fo the bundle is not restarted by pacemaker:
5518d1d4a776        192.168.24.1:8787/rhosp13/openstack-mariadb:pcmklatest                       "/bin/bash /usr/lo..."   25 hours ago        Up 25 hours            
                       galera-bundle-docker-2

Expected results:
pacemaker should have delete the container and recreate a new one with the appropriate bind-mount

Additional info:
Attached crm_report

Comment 2 Andrew Beekhof 2018-03-27 02:29:02 UTC
It appears we at least intended to do a restart:

Mar 26 20:34:05 controller-1 pengine[19471]:   notice:  * Restart    galera-bundle-docker-0 ( controller-1 )   due to resource definition change

The crmd is also under the impression it happened:

Mar 26 20:34:13 controller-1 crmd[19472]:   notice: Initiating stop operation galera-bundle-docker-2_stop_0 on controller-0
Mar 26 20:34:23 controller-1 crmd[19472]:   notice: Initiating stop operation galera-bundle-docker-1_stop_0 on controller-2
Mar 26 20:34:34 controller-1 crmd[19472]:   notice: Initiating stop operation galera-bundle-docker-0_stop_0 locally on controller-1

And on the one node we have logs for we see it completed:

Mar 26 20:34:44 controller-1 crmd[19472]:   notice: Result of stop operation for galera-bundle-docker-0 on controller-1: 0 (ok)

Which is confirmed by docker:

Mar 26 20:34:44 controller-1 dockerd-current[18334]: time="2018-03-26T16:34:44.128626799-04:00" level=debug msg="Sending kill signal 9 to container 10ba9787f9c2150b3dd4f9cd92227a635ce64ca216de3f635b7c0c844229c757"
Mar 26 20:34:44 controller-1 dockerd-current[18334]: time="2018-03-26T16:34:44.210568911-04:00" level=debug msg="containerd: process exited" id=10ba9787f9c2150b3dd4f9cd92227a635ce64ca216de3f635b7c0c844229c757 pid=init status=137 systemPid=48206
Mar 26 20:34:44 controller-1 dockerd-current[18334]: time="2018-03-26T16:34:44.215002739-04:00" level=error msg="containerd: deleting container" error="exit status 1: \"container 10ba9787f9c2150b3dd4f9cd92227a635ce64ca216de3f635b7c0c844229c757 does not exist\\none or more of the container deletions failed\\n\""

And later we see the creation:

Mar 26 20:34:44 controller-1 dockerd-current[18334]: time="2018-03-26T16:34:44.490344517-04:00" level=debug msg="Calling POST /v1.26/containers/create?name=galera-bundle-docker-0"


Could you attach journal.log from the other nodes or look for comparable logs on controller-{0,2} please?
I wonder if the delete+create within a short interval is confusing the docker output.

Comment 3 Damien Ciabrini 2018-03-27 08:22:54 UTC
Oops, sorry I obviously did something wrong... I inspected the state of the galera container right after the "pcs resource update" command..

The command is asynchronous and it first has to stop the galera resource itself, then the container. I didn't wait long enough which was the reason why I got confused and thought pacemaker wasn't behaving as expected.

I reran the test and confirmed that things are working as expected. Closing this bug now.


Note You need to log in before you can comment on or make changes to this bug.