Bug 1846830
| Summary: | openstack overcloud ceph-upgrade run fails with error ""stat: cannot stat '/var/run/ceph/ceph-mon.controller-3.localdomain.asok': No such file or directory" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Sadique Puthen <sputhenp> | ||||
| Component: | Ceph-Ansible | Assignee: | Dimitri Savineau <dsavinea> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Vasishta <vashastr> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 3.2 | CC: | aschoen, ceph-eng-bugs, ceph-qe-bugs, dsavinea, fpantano, gabrioux, gfidente, gmeno, jhoylaer, johfulto, lbezdick, mburns, nthomas, ravsingh, rlondhe, rrasouli, sathlang, tchandra, tserlin, ykaul, yrabl | ||||
| Target Milestone: | z6 | ||||||
| Target Release: | 3.3 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | RHEL: ceph-ansible-3.2.44-1.el7cp Ubuntu: ceph-ansible_3.2.44-2redhat1 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-09-23 12:10:52 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1578730, 1877815 | ||||||
| Attachments: |
|
||||||
Created attachment 1697281 [details]
ansible log
Here is the update. 1 - Deploy OSP13. Verify .asok file is present for mon on all controllers. 2 - Test uploading glance image, create vm and verify the environment is working. 3 - Run the upgrade. 3.1 Run update prepare using openstack overcloud update prepare \.. SUCCESS. Verify .asok file is present for mon on all controllers 3.2 Update controllers # openstack overcloud update run --nodes Controller SUCCESS But .asok file for mon container has disappeared from all controllers. At this time, docker ps shows the mon container is running, ceph -s shows 3 mons running, but systemd status shows mon start up failed during the Controller update. Jun 16 02:01:29 controller-3 systemd: Stopped Ceph Monitor. Jun 16 02:01:29 controller-3 systemd: Starting Ceph Monitor... Jun 16 02:01:29 controller-3 dockerd-current: time="2020-06-16T02:01:29.527339501Z" level=error msg="Handler for DELETE /v1.26/containers/ceph-mon-controller-3 returned error: You cannot remove a running container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. Stop the container before attempting removal or use -f" Jun 16 02:01:29 controller-3 dockerd-current: time="2020-06-16T02:01:29.528166637Z" level=error msg="Handler for DELETE /v1.26/containers/ceph-mon-controller-3 returned error: You cannot remove a running container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. Stop the container before attempting removal or use -f" Jun 16 02:01:29 controller-3 docker: Error response from daemon: You cannot remove a running container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. Stop the container before attempting removal or use -f Jun 16 02:01:29 controller-3 systemd: Started Ceph Monitor. Jun 16 02:01:29 controller-3 dockerd-current: time="2020-06-16T02:01:29.565847487Z" level=error msg="Handler for POST /v1.26/containers/create?name=ceph-mon-controller-3 returned error: Conflict. The container name \"/ceph-mon-controller-3\" is already in use by container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. You have to remove (or rename) that container to be able to reuse that name." Jun 16 02:01:29 controller-3 dockerd-current: time="2020-06-16T02:01:29.566825193Z" level=error msg="Handler for POST /v1.26/containers/create returned error: Conflict. The container name \"/ceph-mon-controller-3\" is already in use by container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. You have to remove (or rename) that container to be able to reuse that name." Jun 16 02:01:29 controller-3 docker: /usr/bin/docker-current: Error response from daemon: Conflict. The container name "/ceph-mon-controller-3" is already in use by container 6738219de56ce6715c8abce77529190ad8403bf886f8a63dcfd73e617a90c874. You have to remove (or rename) that container to be able to reuse that name.. Though this error is there the overall 3.2 is showing as succeeded. PLAY RECAP ********************************************************************* controller-1 : ok=314 changed=141 unreachable=0 failed=0 controller-2 : ok=305 changed=138 unreachable=0 failed=0 controller-3 : ok=305 changed=138 unreachable=0 failed=0 Monday 15 June 2020 13:24:27 -0400 (0:00:00.043) 1:20:37.365 *********** =============================================================================== Updated nodes - Controller Success Can you help to understand why mon restart failed during a controller update? I tried to docker restart <mon id>, but did not help Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3504 *** Bug 1856711 has been marked as a duplicate of this bug. *** *** Bug 1877815 has been marked as a duplicate of this bug. *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Description of problem: Sadique Puthen <sputhenp> Thu, Jun 11, 1:20 PM (4 days ago)  to rhos-tech, ceph-osp, Giulio, John  I am running "openstack overcloud ceph-upgrade .." on the latest version of OSP-13 after running "openstack overcloud update run --nodes CephStorage". It fails with below error. 2020-06-11 01:01:55,393 p=28354 u=mistral | fatal: [172.16.0.53]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["docker", "exec", "ceph-mon-controller-3", "sh", "-c", "stat /var/run/ceph/ceph-mon.controller-3.asok || stat /var/run/ceph/ceph-mon.controller-3.localdomain.asok"], "delta": "0:00:00.114334", "end": "2020-06-11 05:01:55.382138", "msg": "non-zero return code", "rc": 1, "start": "2020-06-11 05:01:55.267804", "stderr": "stat: cannot stat '/var/run/ceph/ceph-mon.controller-3.asok': No such file or directory\nstat: cannot stat '/var/run/ceph/ceph-mon.controller-3.localdomain.asok': No such file or directory", "stderr_lines": ["stat: cannot stat '/var/run/ceph/ceph-mon.controller-3.asok': No such file or directory", "stat: cannot stat '/var/run/ceph/ceph-mon.controller-3.localdomain.asok': No such file or directory"], "stdout": "", "stdout_lines": []} # docker exec ceph-mon-controller-3 sh -c stat /var/run/ceph/ceph-mon.controller-3.asok || stat /var/run/ceph/ceph-mon.controller-3.localdomain.asok stat: missing operand Try 'stat --help' for more information. stat: cannot stat ‘/var/run/ceph/ceph-mon.controller-3.localdomain.asok’: No such file or directory None of the .asok does exist inside the container. It only has the asok for ceph-mgr container. # docker exec -it ceph-mon-controller-3 /bin/bash # ls /var/run/ceph/ ceph-mgr.controller-3.asok This was freshly deployed OSP-13 just to test upgrade. This problem surfaced only during ceph upgrade step. Your help is highly appreciated. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: