Bug 1590560
Summary: | ceph upgrade/deployment fails with "Error response from daemon: No such container: ceph-create-keys" | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | |
Component: | ceph-ansible | Assignee: | Sébastien Han <shan> | |
Status: | CLOSED ERRATA | QA Contact: | Yogev Rabl <yrabl> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 13.0 (Queens) | CC: | ccamacho, dbecker, gabrioux, gfidente, johfulto, knylande, mburns, morazi, nmorell, sasha, sclewis, scohen, yprokule | |
Target Milestone: | ga | Keywords: | Triaged | |
Target Release: | 13.0 (Queens) | Flags: | scohen:
needinfo+
|
|
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | ceph-ansible-3.1.0-0.1.rc9.el7cp | Doc Type: | Known Issue | |
Doc Text: |
The ceph-ansible utility does not always remove the ceph-create-keys container from the same node where it was created.
Because of this, the deployment may fail with the message "Error response from daemon: No such container: ceph-create-keys." This may affect any ceph-ansible run, including fresh deployments, that have:
* multiple compute notes or
* a custom role behaving as ceph client which is also hosting a service consuming ceph.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1590746 (view as bug list) | Environment: | ||
Last Closed: | 2018-06-27 13:58:15 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1590746 | |||
Bug Blocks: |
Description
Marius Cornea
2018-06-12 21:27:55 UTC
We believe this can be hit for any ceph-ansible run (including fresh deployments) with >1 compute node (or custom role behaving as ceph client, hosting a service consuming ceph). verified on ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086 I've seen a very similar example of this problem when using Custom Roles (CephAll and ControllerNoCeph) where it doesn't find ceph-osd-1: Error response from daemon: No such container: ceph-osd-1 I see this error roughly 50% of the time when trying to do a deploy on an existing overcloud, ceph-ansible-3.2.0-1.el7cp.noarch Another noteworthy part is it's trying to unmount /dev/sda2, which is where my host is installed device list is as follows: parameter_defaults: CephAnsibleDisksConfig: osd_scenario: lvm osd_objectstore: bluestore devices: - /dev/sdb - /dev/sdc - /dev/sdd - /dev/sde - /dev/sdf - /dev/sdg - /dev/sdh - /dev/sdi - /dev/sdj - /dev/sdk - /dev/sdl - /dev/sdm - /dev/sdn - /dev/sdo - /dev/sdp - /dev/sdq - /dev/sdr - /dev/sds - /dev/sdt - /dev/sdu - /dev/sdv - /dev/sdw Full Error Below: "stdout_lines": [ "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "Socket file /var/run/ceph/ceph-osd.1.asok could not be found, which means the osd daemon is not running. Showing ceph-osd unit logs now:", "-- Logs begin at Tue 2019-02-05 07:34:21 UTC, end at Thu 2019-02-07 23:02:09 UTC. --", "Feb 05 07:59:53 overcloud-ceph-all-2 systemd[1]: Starting Ceph OSD...", "Feb 05 07:59:54 overcloud-ceph-all-2 docker[51388]: Error response from daemon: No such container: ceph-osd-1", "Feb 05 07:59:54 overcloud-ceph-all-2 docker[51402]: Error response from daemon: No such container: ceph-osd-1", "Feb 05 07:59:54 overcloud-ceph-all-2 systemd[1]: Started Ceph OSD.", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: restorecon /var/lib/ceph/osd/ceph-1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a --path /var/lib/ceph/osd/ceph-1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: ln -snf /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a /var/lib/ceph/osd/ceph-1/block", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: chown -R ceph:ceph /dev/mapper/ceph--72de8451--9a9c--4462--b5af--59e442f225fa-osd--data--fe82915e--4d28--4744--988a--bd4250caa54a", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: --> ceph-volume lvm activate successful for osd ID: 1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-05 08:00:21 /entrypoint.sh: SUCCESS", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: exec: PID 57921: spawning /usr/bin/ceph-osd --cluster ceph -f -i 1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: exec: Waiting 57921 to quit", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal", "Feb 05 08:00:22 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-05 08:00:22.410304 7f9197bdfd80 -1 osd.1 0 log_to_monitors {default=true}", "Feb 05 08:00:23 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-05 08:00:23.907197 7f917fbe6700 -1 osd.1 0 waiting for initial osdmap", "Feb 07 22:57:06 overcloud-ceph-all-2 systemd[1]: Stopping Ceph OSD...", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: managing teardown after SIGTERM", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: Sending SIGTERM to PID 57921", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: Waiting PID 57921 to terminate .2019-02-07 22:57:06.572019 7f9175bd2700 -1 Fail to read '/proc/309854/cmdline' error = (3) No such process", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:06.572054 7f9175bd2700 -1 received signal: Terminated from PID: 309854 task name: <unknown> UID: 0", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:06.572071 7f9175bd2700 -1 osd.1 100 *** Got signal Terminated ***", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: .2019-02-07 22:57:06.694787 7f9175bd2700 -1 osd.1 100 shutdown", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: ..........................", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: Process 57921 is terminated", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: sigterm_cleanup_post", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:09 /entrypoint.sh: osd_volume_activate: Unmounting /dev/sda2", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: umount: /var/lib/ceph: target is busy.", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: (In some cases useful info about processes that use", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: the device is found by lsof(8) or fuser(1))", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:09 /entrypoint.sh: osd_volume_activate: Failed to umount /dev/sda2", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: osd_volume_activate.sh: line 47: lsof: command not found", "Feb 07 22:57:09 overcloud-ceph-all-2 docker[309842]: ceph-osd-1", "Feb 07 22:57:09 overcloud-ceph-all-2 systemd[1]: Stopped Ceph OSD.", "Feb 07 22:57:09 overcloud-ceph-all-2 systemd[1]: Starting Ceph OSD...", "Feb 07 22:57:09 overcloud-ceph-all-2 docker[309924]: Error response from daemon: No such container: ceph-osd-1", "Feb 07 22:57:09 overcloud-ceph-all-2 docker[309936]: Error response from daemon: No such container: ceph-osd-1", "Feb 07 22:57:09 overcloud-ceph-all-2 systemd[1]: Started Ceph OSD.", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: restorecon /var/lib/ceph/osd/ceph-1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a --path /var/lib/ceph/osd/ceph-1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: ln -snf /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a /var/lib/ceph/osd/ceph-1/block", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: chown -R ceph:ceph /dev/mapper/ceph--72de8451--9a9c--4462--b5af--59e442f225fa-osd--data--fe82915e--4d28--4744--988a--bd4250caa54a", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: --> ceph-volume lvm activate successful for osd ID: 1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: 2019-02-07 22:57:32 /entrypoint.sh: SUCCESS", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: exec: PID 310325: spawning /usr/bin/ceph-osd --cluster ceph -f -i 1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: exec: Waiting 310325 to quit", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal", "Feb 07 22:57:33 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: 2019-02-07 22:57:33.370792 7fb551040d80 -1 osd.1 100 log_to_monitors {default=true}" ] } |