Bug 1674517

Summary:	Paunch does not start stopped containers
Product:	Red Hat OpenStack	Reporter:	Lukas Bezdicka <lbezdick>
Component:	python-paunch	Assignee:	Luke Short <lshort>
Status:	CLOSED WONTFIX	QA Contact:	nlevinki <nlevinki>
Severity:	high	Docs Contact:
Priority:	high
Version:	13.0 (Queens)	CC:	dprince, emacchi, jstransk, lbezdick, lmiccini, lshort, michele, sathlang, sbaker
Target Milestone:	---	Keywords:	Triaged, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-02-05 21:40:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Lukas Bezdicka 2019-02-11 14:23:58 UTC

During minor update of OSP13 I noticed that containers that I manually stopped weren't started back up. This means that if for any reason container is stopped paunch will not start it back up.

Comment 1 Steve Baker 2019-02-11 22:21:47 UTC

Paunch does not manage the lifecycle of containers. For OSP13 and OSP14 the docker service manages the lifecycle of containers via the restart policy set in the paunch config.

By manually stopping the containers did you change the restart policy? doing a "docker inspect <container>" should show what state it is in.

For OSP15 paunch writes out systemd unit files which manage containers via podman.

Comment 2 Lukas Bezdicka 2019-02-12 10:38:00 UTC

I'm sorry my description wasn't good. Issue is:

For any reason "docker stop <service>" like memcached happens the subsequent stack update for scale up, minor update or upgrade will not start the container back up. This is because we check in paunch if container exists (docker ps -a) to decide whether we should start it up or not.

Comment 3 Jiri Stransky 2019-02-18 14:03:19 UTC

Just to clarify further, this is not about lifecycle, but about re-asserting the state via Paunch, which doesn't seem to work as we'd expect:

1) Stop container

2) Run paunch, which has the container defined

3) Expecting the container to be present and running (paunch asserting the defined state), but the container is still stopped


If in step 1 we'd delete the container instead of stopping it, then step 2 would start the container. It's counter-intuitive that after deleting the container, Paunch re-asserts the state to match what's defined in config files, but after stopping the container it doesn't.

In theory this isn't only about updates. The `overcloud deploy` action should ideally put the overcloud into state as defined by t-h-t. So if the user manually stopped some containers, i'd expect them to get started. E.g. Puppet or Ansible would similarly re-assert that services are running.

Does starting the containers when they are stopped make sense within Paunch scope? I think addressing it anywhere else would be quite hacky.

Comment 4 Sofer Athlan-Guyot 2019-02-25 17:19:46 UTC

Re-assigning to dfg:df to foster the discussion.

Comment 5 Steve Baker 2019-02-25 20:21:46 UTC

Yes, I think paunch should delete stopped containers which are expected to be running, then recreate them. We'll discuss at the triage meeting who will be assigned to this.

Comment 6 Luke Short 2019-03-04 20:11:36 UTC

There appears to be an upstream bug for querying for containers in the `stopped` state with podman. I have opened an upstream bug about this here: https://github.com/containers/libpod/issues/2526

Comment 7 Luke Short 2019-03-25 16:07:49 UTC

Steve has shown me an alternative way to query for stopped containers. We are working on getting the new code and tests merged in.