During minor update of OSP13 I noticed that containers that I manually stopped weren't started back up. This means that if for any reason container is stopped paunch will not start it back up.
Paunch does not manage the lifecycle of containers. For OSP13 and OSP14 the docker service manages the lifecycle of containers via the restart policy set in the paunch config. By manually stopping the containers did you change the restart policy? doing a "docker inspect <container>" should show what state it is in. For OSP15 paunch writes out systemd unit files which manage containers via podman.
I'm sorry my description wasn't good. Issue is: For any reason "docker stop <service>" like memcached happens the subsequent stack update for scale up, minor update or upgrade will not start the container back up. This is because we check in paunch if container exists (docker ps -a) to decide whether we should start it up or not.
Just to clarify further, this is not about lifecycle, but about re-asserting the state via Paunch, which doesn't seem to work as we'd expect: 1) Stop container 2) Run paunch, which has the container defined 3) Expecting the container to be present and running (paunch asserting the defined state), but the container is still stopped If in step 1 we'd delete the container instead of stopping it, then step 2 would start the container. It's counter-intuitive that after deleting the container, Paunch re-asserts the state to match what's defined in config files, but after stopping the container it doesn't. In theory this isn't only about updates. The `overcloud deploy` action should ideally put the overcloud into state as defined by t-h-t. So if the user manually stopped some containers, i'd expect them to get started. E.g. Puppet or Ansible would similarly re-assert that services are running. Does starting the containers when they are stopped make sense within Paunch scope? I think addressing it anywhere else would be quite hacky.
Re-assigning to dfg:df to foster the discussion.
Yes, I think paunch should delete stopped containers which are expected to be running, then recreate them. We'll discuss at the triage meeting who will be assigned to this.
There appears to be an upstream bug for querying for containers in the `stopped` state with podman. I have opened an upstream bug about this here: https://github.com/containers/libpod/issues/2526
Steve has shown me an alternative way to query for stopped containers. We are working on getting the new code and tests merged in.