Similar error occurred for MGRs - failed: [magna113 -> magna113] (item=magna113) => { "changed": true, "cmd": [ "/tmp/restart_mgr_daemon.sh" ], "delta": "0:01:17.305509", "end": "2018-02-26 16:47:21.102684", "invocation": { "module_args": { "_raw_params": "/tmp/restart_mgr_daemon.sh", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true } }, "item": "magna113", "msg": "non-zero return code", "rc": 1, "start": "2018-02-26 16:46:03.797175", "stderr": "Error response from daemon: No such container: ceph-mgr-magna113", "stderr_lines": [ "Error response from daemon: No such container: ceph-mgr-magna113" ], "stdout": "Socket file /var/run/ceph/ceph1-mgr.magna113.asok could not be found, which means ceph manager is not running.", "stdout_lines": [ "Socket file /var/run/ceph/ceph1-mgr.magna113.asok could not be found, which means ceph manager is not running." ] $ sudo docker exec ceph-mgr-magna113 ls /var/run/ceph ceph1-mgr.magna113.ceph.redhat.com.asok ----------------------------------------- Execution details - 3.0 Cluster was configure when hostname was short hostname, hostname were changed to fqdn, updated ceph-ansible to 3.0.26-1.el7cp.noarch, Ran rolling update. Moving back to ASSIGNED state, Please let me know if there are any concerns. Regards, Vasishta AQE, Ceph
Vashita, the issue is fixed with the latest 3.0 container image, Ken, do we have it? Thanks. I don't see it in Brew which sounds weird to me. I committed here: https://bugzilla.redhat.com/show_bug.cgi?id=1546127#c62
Right, the only issue is that registry.access.redhat.com/rhceph/rhceph-3-rhel7 doesn't have the fix which is present in rhceph:ceph-3.0-rhel-7-docker-candidate-38019-20180222163657 How can we proceed?
Sébastien, QE is testing upgrading from the latest released container to the latest unreleased. From: registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest To: rhceph:ceph-3.0-rhel-7-docker-candidate-38019-20180222163657 When they've verified the fix is working in that container for this BZ, we will ship the final gold-signed container to customers on registry.access.redhat.com, and it will become rhceph-3-rhel7:latest. I see both of these images both defined on magna113.ceph.redhat.com (in `docker images`), so I imagine that's what Vashita has been testing here already.
Created attachment 1401136 [details] File contains contents of ansible-playbook log Hi Sebastien, As Ken mentioned, I was trying to upgrade from 3.0 live to ceph-3.0-rhel-7-docker-candidate-38019-20180222163657 I think, after upgrading mon, mgr is restarted without getting upgraded, so asok file with fqdn in name has been created, which has resulted in this failure. Regards, Vasishta Shastry AQE, Ceph
In this case, the failure is expected, if you don't apply the workaround. I see your /etc/hostname still has the FQDN, you have to force the shortname.
Yes, With work around rolling update worked fine for me. However, it can be observed that previously, the new asok file created using new container image doesn't contain fqdn in its name even though hostname is fqdn. Work around is needed as ceph-ansible restarts mgr daemon before updating which causes creation of asok file with fqdn in its name. Regards, Vasishta AQE, Ceph
You don't see the fqdn anymore because the container enforces the shortname. What's needed so we can move this to VERIFIED? Thanks
(In reply to leseb from comment #16) > You don't see the fqdn anymore because the container enforces the shortname. Though we don't see fqdn, upgrade fails as mgr containers are restarted before upgrading which results in creation of asok file with fqdn in file name. Will file separate bug for this. > What's needed so we can move this to VERIFIED? > > Thanks Will move it to VERIFIED state once it comes ON_QA. Ceph-ansible - ceph-ansible-3.0.26-1.el7cp Container - 3.0 live to ceph-3.0-rhel-7-docker-candidate-38019-20180222163657 Regards, Vasishta shastry AQE, Ceph
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0474
*** Bug 1553818 has been marked as a duplicate of this bug. ***