1876717 – RHOSP16.1 - podman "cannot remove container <container ID> as it is running - running or paused containers cannot be removed without force: container state improper"

Bug 1876717 - RHOSP16.1 - podman "cannot remove container <container ID> as it is running - running or paused containers cannot be removed without force: container state improper"

Summary: RHOSP16.1 - podman "cannot remove container <container ID> as it is running -...

Keywords:
Status:	CLOSED DUPLICATE of bug 1858865
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	ceph-ansible
Sub Component:
Version:	16.1 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Guillaume Abrioux
QA Contact:	Yogev Rabl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-08 03:28 UTC by XinhuaLi
Modified:	2023-12-15 19:15 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-21 15:46:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-30876	0	None	None	None	2023-12-15 19:15:47 UTC

Description XinhuaLi 2020-09-08 03:28:26 UTC

Description of problem:
We can see container keep restarting and complaining "cannot remove container ". At the same time, these logs are keep flushing.
Actually, there is only one ceph-mon container is running at that time.
As a workaround, we cannot use "podman restart <container ID>" but only "podman stop <container ID>" to restore the state.

-------------------------------------------------------------------------------------
Sep 4 03:15:47 overcloud-controller-0 systemd[1]: Stopped Ceph Monitor.
Sep 4 03:15:47 overcloud-controller-0 systemd[1]: Starting Ceph Monitor...
Sep 4 03:15:47 overcloud-controller-0 podman[709253]: Error: cannot remove container 638a2692f6d041eaeb9f66a1d8b85a53c15721c96af74a6eeafb1c319f6d6725 as it is running - running or paused containers cannot be removed without force: container state improper
Sep 4 03:15:47 overcloud-controller-0 podman[709276]: Error: error creating container storage: the container name "ceph-mon-overcloud-controller-0" is already in use by "638a2692f6d041eaeb9f66a1d8b85a53c15721c96af74a6eeafb1c319f6d6725". You have to remove that container to be able to reuse that name.: that name is already in use
Sep 4 03:15:47 overcloud-controller-0 systemd[1]: ceph-mon: Control process exited, code=exited status=125
Sep 4 03:15:47 overcloud-controller-0 systemd[1]: ceph-mon: Failed with result 'exit-code'.
Sep 4 03:15:47 overcloud-controller-0 systemd[1]: Failed to start Ceph Monitor.
-------------------------------------------------------------------------------------

Version-Release number of selected component (if applicable):
-------------------------------------------------------------------------------------
RHOSP 16.1
rhceph-4-rhel8:4-32
podman-1.6.4-15.module+el8.2.0+7290+954fb593.x86_64
podman-docker-1.6.4-15.module+el8.2.0+7290+954fb593.noarch
-------------------------------------------------------------------------------------

How reproducible:
Currently, there is no exact reproduce procedure yet. It happens some times.

Steps to Reproduce:
1.
2.
3.

Actual results:
Container cannot restart correctly and keep flushing the logs.

Expected results:
Container can restart/start without error.

Additional info:
It seems that there could be something related the state detection inside podman.
Could you please help to check ?

Regards.
Sam

Comment 1 John Fulton 2020-09-21 15:46:40 UTC

This message:

the container name "ceph-mon-overcloud-controller-0" is already in use by "638a2692f6d041eaeb9f66a1d8b85a53c15721c96af74a6eeafb1c319f6d6725". You have to remove that container to be able to reuse that name.: that name is already in use

Is from the ceph mon systemd unit file failing to start the ceph mon container because that container is already in use. The unit file needs to be updated so that it is able to remove the older container if it is already in use. When it's removed, then the new container will be able to start. The old container, 638a..., might not be running correctly but parts of it are left over and need to be cleaned up. 

The unit file shouldn't be handed edited. Instead it is managed by ceph-ansible. ceph-ansible has had updates in how it manages the unit file to avoid this problem and the bug was fixed in bz 1858865. It is also documented in this bug that it can result in the cinder-volume being down.

Ensure you have the errata from bug 1858865 (ceph-ansible-4.0.25.1-1.el8cp) on your UNDERCLOUD and then run a stack update. This will result in ceph-ansible configuring your unit files so that you don't have the problem.

*** This bug has been marked as a duplicate of bug 1858865 ***

Note You need to log in before you can comment on or make changes to this bug.