Description of problem: Sadique Puthen <sputhenp> Thu, Jun 11, 1:20 PM (4 days ago)  to rhos-tech, ceph-osp, Giulio, John  I am running "openstack overcloud ceph-upgrade .." on the latest version of OSP-13 after running "openstack overcloud update run --nodes CephStorage". It fails with below error. 2020-06-11 01:01:55,393 p=28354 u=mistral | fatal: [172.16.0.53]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["docker", "exec", "ceph-mon-controller-3", "sh", "-c", "stat /var/run/ceph/ceph-mon.controller-3.asok || stat /var/run/ceph/ceph-mon.controller-3.localdomain.asok"], "delta": "0:00:00.114334", "end": "2020-06-11 05:01:55.382138", "msg": "non-zero return code", "rc": 1, "start": "2020-06-11 05:01:55.267804", "stderr": "stat: cannot stat '/var/run/ceph/ceph-mon.controller-3.asok': No such file or directory\nstat: cannot stat '/var/run/ceph/ceph-mon.controller-3.localdomain.asok': No such file or directory", "stderr_lines": ["stat: cannot stat '/var/run/ceph/ceph-mon.controller-3.asok': No such file or directory", "stat: cannot stat '/var/run/ceph/ceph-mon.controller-3.localdomain.asok': No such file or directory"], "stdout": "", "stdout_lines": []} # docker exec ceph-mon-controller-3 sh -c stat /var/run/ceph/ceph-mon.controller-3.asok || stat /var/run/ceph/ceph-mon.controller-3.localdomain.asok stat: missing operand Try 'stat --help' for more information. stat: cannot stat ‘/var/run/ceph/ceph-mon.controller-3.localdomain.asok’: No such file or directory None of the .asok does exist inside the container. It only has the asok for ceph-mgr container. # docker exec -it ceph-mon-controller-3 /bin/bash # ls /var/run/ceph/ ceph-mgr.controller-3.asok This was freshly deployed OSP-13 just to test upgrade. This problem surfaced only during ceph upgrade step. Your help is highly appreciated. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1697281 [details] ansible log
Here is the update. 1 - Deploy OSP13. Verify .asok file is present for mon on all controllers. 2 - Test uploading glance image, create vm and verify the environment is working. 3 - Run the upgrade. 3.1 Run update prepare using openstack overcloud update prepare \.. SUCCESS. Verify .asok file is present for mon on all controllers 3.2 Update controllers # openstack overcloud update run --nodes Controller SUCCESS But .asok file for mon container has disappeared from all controllers. At this time, docker ps shows the mon container is running, ceph -s shows 3 mons running, but systemd status shows mon start up failed during the Controller update. Jun 16 02:01:29 controller-3 systemd: Stopped Ceph Monitor. Jun 16 02:01:29 controller-3 systemd: Starting Ceph Monitor... Jun 16 02:01:29 controller-3 dockerd-current: time="2020-06-16T02:01:29.527339501Z" level=error msg="Handler for DELETE /v1.26/containers/ceph-mon-controller-3 returned error: You cannot remove a running container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. Stop the container before attempting removal or use -f" Jun 16 02:01:29 controller-3 dockerd-current: time="2020-06-16T02:01:29.528166637Z" level=error msg="Handler for DELETE /v1.26/containers/ceph-mon-controller-3 returned error: You cannot remove a running container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. Stop the container before attempting removal or use -f" Jun 16 02:01:29 controller-3 docker: Error response from daemon: You cannot remove a running container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. Stop the container before attempting removal or use -f Jun 16 02:01:29 controller-3 systemd: Started Ceph Monitor. Jun 16 02:01:29 controller-3 dockerd-current: time="2020-06-16T02:01:29.565847487Z" level=error msg="Handler for POST /v1.26/containers/create?name=ceph-mon-controller-3 returned error: Conflict. The container name \"/ceph-mon-controller-3\" is already in use by container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. You have to remove (or rename) that container to be able to reuse that name." Jun 16 02:01:29 controller-3 dockerd-current: time="2020-06-16T02:01:29.566825193Z" level=error msg="Handler for POST /v1.26/containers/create returned error: Conflict. The container name \"/ceph-mon-controller-3\" is already in use by container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. You have to remove (or rename) that container to be able to reuse that name." Jun 16 02:01:29 controller-3 docker: /usr/bin/docker-current: Error response from daemon: Conflict. The container name "/ceph-mon-controller-3" is already in use by container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. You have to remove (or rename) that container to be able to reuse that name.. Though this error is there the overall 3.2 is showing as succeeded. PLAY RECAP ********************************************************************* controller-1 : ok=314 changed=141 unreachable=0 failed=0 controller-2 : ok=305 changed=138 unreachable=0 failed=0 controller-3 : ok=305 changed=138 unreachable=0 failed=0 Monday 15 June 2020 13:24:27 -0400 (0:00:00.043) 1:20:37.365 *********** =============================================================================== Updated nodes - Controller Success Can you help to understand why mon restart failed during a controller update? I tried to docker restart <mon id>, but did not help
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3504
*** Bug 1856711 has been marked as a duplicate of this bug. ***
*** Bug 1877815 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days