Description of problem: We can see container keep restarting and complaining "cannot remove container ". At the same time, these logs are keep flushing. Actually, there is only one ceph-mon container is running at that time. As a workaround, we cannot use "podman restart <container ID>" but only "podman stop <container ID>" to restore the state. ------------------------------------------------------------------------------------- Sep 4 03:15:47 overcloud-controller-0 systemd[1]: Stopped Ceph Monitor. Sep 4 03:15:47 overcloud-controller-0 systemd[1]: Starting Ceph Monitor... Sep 4 03:15:47 overcloud-controller-0 podman[709253]: Error: cannot remove container 638a2692f6d041eaeb9f66a1d8b85a53c15721c96af74a6eeafb1c319f6d6725 as it is running - running or paused containers cannot be removed without force: container state improper Sep 4 03:15:47 overcloud-controller-0 podman[709276]: Error: error creating container storage: the container name "ceph-mon-overcloud-controller-0" is already in use by "638a2692f6d041eaeb9f66a1d8b85a53c15721c96af74a6eeafb1c319f6d6725". You have to remove that container to be able to reuse that name.: that name is already in use Sep 4 03:15:47 overcloud-controller-0 systemd[1]: ceph-mon: Control process exited, code=exited status=125 Sep 4 03:15:47 overcloud-controller-0 systemd[1]: ceph-mon: Failed with result 'exit-code'. Sep 4 03:15:47 overcloud-controller-0 systemd[1]: Failed to start Ceph Monitor. ------------------------------------------------------------------------------------- Version-Release number of selected component (if applicable): ------------------------------------------------------------------------------------- RHOSP 16.1 rhceph-4-rhel8:4-32 podman-1.6.4-15.module+el8.2.0+7290+954fb593.x86_64 podman-docker-1.6.4-15.module+el8.2.0+7290+954fb593.noarch ------------------------------------------------------------------------------------- How reproducible: Currently, there is no exact reproduce procedure yet. It happens some times. Steps to Reproduce: 1. 2. 3. Actual results: Container cannot restart correctly and keep flushing the logs. Expected results: Container can restart/start without error. Additional info: It seems that there could be something related the state detection inside podman. Could you please help to check ? Regards. Sam
This message: the container name "ceph-mon-overcloud-controller-0" is already in use by "638a2692f6d041eaeb9f66a1d8b85a53c15721c96af74a6eeafb1c319f6d6725". You have to remove that container to be able to reuse that name.: that name is already in use Is from the ceph mon systemd unit file failing to start the ceph mon container because that container is already in use. The unit file needs to be updated so that it is able to remove the older container if it is already in use. When it's removed, then the new container will be able to start. The old container, 638a..., might not be running correctly but parts of it are left over and need to be cleaned up. The unit file shouldn't be handed edited. Instead it is managed by ceph-ansible. ceph-ansible has had updates in how it manages the unit file to avoid this problem and the bug was fixed in bz 1858865. It is also documented in this bug that it can result in the cinder-volume being down. Ensure you have the errata from bug 1858865 (ceph-ansible-4.0.25.1-1.el8cp) on your UNDERCLOUD and then run a stack update. This will result in ceph-ansible configuring your unit files so that you don't have the problem. *** This bug has been marked as a duplicate of bug 1858865 ***