Created attachment 1413351 [details] crm_report before and after bind-mount change Description of problem: When adding a bind mount in a bundle, pacemaker does not react to the configuration change and does not restart automatically the associated container. Version-Release number of selected component (if applicable): 1.1.18-11.el7-2b07d5c5a9 How reproducible: Always Steps to Reproduce: 1. create a bundle in the cluster (e.g. start from an OSP 13 deployment) 2. change the bind-mount configuration: pcs resource bundle update galera-bundle storage-map add id=mysql-foo source-dir=/foo target-dir=/foo options=rw Actual results: the container started fo the bundle is not restarted by pacemaker: 5518d1d4a776 192.168.24.1:8787/rhosp13/openstack-mariadb:pcmklatest "/bin/bash /usr/lo..." 25 hours ago Up 25 hours galera-bundle-docker-2 Expected results: pacemaker should have delete the container and recreate a new one with the appropriate bind-mount Additional info: Attached crm_report
It appears we at least intended to do a restart: Mar 26 20:34:05 controller-1 pengine[19471]: notice: * Restart galera-bundle-docker-0 ( controller-1 ) due to resource definition change The crmd is also under the impression it happened: Mar 26 20:34:13 controller-1 crmd[19472]: notice: Initiating stop operation galera-bundle-docker-2_stop_0 on controller-0 Mar 26 20:34:23 controller-1 crmd[19472]: notice: Initiating stop operation galera-bundle-docker-1_stop_0 on controller-2 Mar 26 20:34:34 controller-1 crmd[19472]: notice: Initiating stop operation galera-bundle-docker-0_stop_0 locally on controller-1 And on the one node we have logs for we see it completed: Mar 26 20:34:44 controller-1 crmd[19472]: notice: Result of stop operation for galera-bundle-docker-0 on controller-1: 0 (ok) Which is confirmed by docker: Mar 26 20:34:44 controller-1 dockerd-current[18334]: time="2018-03-26T16:34:44.128626799-04:00" level=debug msg="Sending kill signal 9 to container 10ba9787f9c2150b3dd4f9cd92227a635ce64ca216de3f635b7c0c844229c757" Mar 26 20:34:44 controller-1 dockerd-current[18334]: time="2018-03-26T16:34:44.210568911-04:00" level=debug msg="containerd: process exited" id=10ba9787f9c2150b3dd4f9cd92227a635ce64ca216de3f635b7c0c844229c757 pid=init status=137 systemPid=48206 Mar 26 20:34:44 controller-1 dockerd-current[18334]: time="2018-03-26T16:34:44.215002739-04:00" level=error msg="containerd: deleting container" error="exit status 1: \"container 10ba9787f9c2150b3dd4f9cd92227a635ce64ca216de3f635b7c0c844229c757 does not exist\\none or more of the container deletions failed\\n\"" And later we see the creation: Mar 26 20:34:44 controller-1 dockerd-current[18334]: time="2018-03-26T16:34:44.490344517-04:00" level=debug msg="Calling POST /v1.26/containers/create?name=galera-bundle-docker-0" Could you attach journal.log from the other nodes or look for comparable logs on controller-{0,2} please? I wonder if the delete+create within a short interval is confusing the docker output.
Oops, sorry I obviously did something wrong... I inspected the state of the galera container right after the "pcs resource update" command.. The command is asynchronous and it first has to stop the galera resource itself, then the container. I didn't wait long enough which was the reason why I got confused and thought pacemaker wasn't behaving as expected. I reran the test and confirmed that things are working as expected. Closing this bug now.