Bug 1590560
| Summary: | ceph upgrade/deployment fails with "Error response from daemon: No such container: ceph-create-keys" | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | |
| Component: | ceph-ansible | Assignee: | Sébastien Han <shan> | |
| Status: | CLOSED ERRATA | QA Contact: | Yogev Rabl <yrabl> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 13.0 (Queens) | CC: | ccamacho, dbecker, gabrioux, gfidente, johfulto, knylande, mburns, morazi, nmorell, sasha, sclewis, scohen, yprokule | |
| Target Milestone: | ga | Keywords: | Triaged | |
| Target Release: | 13.0 (Queens) | Flags: | scohen:
needinfo+
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | ceph-ansible-3.1.0-0.1.rc9.el7cp | Doc Type: | Known Issue | |
| Doc Text: |
The ceph-ansible utility does not always remove the ceph-create-keys container from the same node where it was created.
Because of this, the deployment may fail with the message "Error response from daemon: No such container: ceph-create-keys." This may affect any ceph-ansible run, including fresh deployments, that have:
* multiple compute notes or
* a custom role behaving as ceph client which is also hosting a service consuming ceph.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1590746 (view as bug list) | Environment: | ||
| Last Closed: | 2018-06-27 13:58:15 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1590746 | |||
| Bug Blocks: | ||||
We believe this can be hit for any ceph-ansible run (including fresh deployments) with >1 compute node (or custom role behaving as ceph client, hosting a service consuming ceph). verified on ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086 I've seen a very similar example of this problem when using Custom Roles (CephAll and ControllerNoCeph) where it doesn't find ceph-osd-1: Error response from daemon: No such container: ceph-osd-1
I see this error roughly 50% of the time when trying to do a deploy on an existing overcloud, ceph-ansible-3.2.0-1.el7cp.noarch
Another noteworthy part is it's trying to unmount /dev/sda2, which is where my host is installed device list is as follows:
parameter_defaults:
CephAnsibleDisksConfig:
osd_scenario: lvm
osd_objectstore: bluestore
devices:
- /dev/sdb
- /dev/sdc
- /dev/sdd
- /dev/sde
- /dev/sdf
- /dev/sdg
- /dev/sdh
- /dev/sdi
- /dev/sdj
- /dev/sdk
- /dev/sdl
- /dev/sdm
- /dev/sdn
- /dev/sdo
- /dev/sdp
- /dev/sdq
- /dev/sdr
- /dev/sds
- /dev/sdt
- /dev/sdu
- /dev/sdv
- /dev/sdw
Full Error Below:
"stdout_lines": [
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"",
"",
"Socket file /var/run/ceph/ceph-osd.1.asok could not be found, which means the osd daemon is not running. Showing ceph-osd unit logs now:",
"-- Logs begin at Tue 2019-02-05 07:34:21 UTC, end at Thu 2019-02-07 23:02:09 UTC. --",
"Feb 05 07:59:53 overcloud-ceph-all-2 systemd[1]: Starting Ceph OSD...",
"Feb 05 07:59:54 overcloud-ceph-all-2 docker[51388]: Error response from daemon: No such container: ceph-osd-1",
"Feb 05 07:59:54 overcloud-ceph-all-2 docker[51402]: Error response from daemon: No such container: ceph-osd-1",
"Feb 05 07:59:54 overcloud-ceph-all-2 systemd[1]: Started Ceph OSD.",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: restorecon /var/lib/ceph/osd/ceph-1",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a --path /var/lib/ceph/osd/ceph-1",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: ln -snf /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a /var/lib/ceph/osd/ceph-1/block",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: chown -R ceph:ceph /dev/mapper/ceph--72de8451--9a9c--4462--b5af--59e442f225fa-osd--data--fe82915e--4d28--4744--988a--bd4250caa54a",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: --> ceph-volume lvm activate successful for osd ID: 1",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-05 08:00:21 /entrypoint.sh: SUCCESS",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: exec: PID 57921: spawning /usr/bin/ceph-osd --cluster ceph -f -i 1",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: exec: Waiting 57921 to quit",
"Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal",
"Feb 05 08:00:22 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-05 08:00:22.410304 7f9197bdfd80 -1 osd.1 0 log_to_monitors {default=true}",
"Feb 05 08:00:23 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-05 08:00:23.907197 7f917fbe6700 -1 osd.1 0 waiting for initial osdmap",
"Feb 07 22:57:06 overcloud-ceph-all-2 systemd[1]: Stopping Ceph OSD...",
"Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: managing teardown after SIGTERM",
"Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: Sending SIGTERM to PID 57921",
"Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: Waiting PID 57921 to terminate .2019-02-07 22:57:06.572019 7f9175bd2700 -1 Fail to read '/proc/309854/cmdline' error = (3) No such process",
"Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:06.572054 7f9175bd2700 -1 received signal: Terminated from PID: 309854 task name: <unknown> UID: 0",
"Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:06.572071 7f9175bd2700 -1 osd.1 100 *** Got signal Terminated ***",
"Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: .2019-02-07 22:57:06.694787 7f9175bd2700 -1 osd.1 100 shutdown",
"Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: ..........................",
"Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: Process 57921 is terminated",
"Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: sigterm_cleanup_post",
"Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:09 /entrypoint.sh: osd_volume_activate: Unmounting /dev/sda2",
"Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: umount: /var/lib/ceph: target is busy.",
"Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: (In some cases useful info about processes that use",
"Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: the device is found by lsof(8) or fuser(1))",
"Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:09 /entrypoint.sh: osd_volume_activate: Failed to umount /dev/sda2",
"Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: osd_volume_activate.sh: line 47: lsof: command not found",
"Feb 07 22:57:09 overcloud-ceph-all-2 docker[309842]: ceph-osd-1",
"Feb 07 22:57:09 overcloud-ceph-all-2 systemd[1]: Stopped Ceph OSD.",
"Feb 07 22:57:09 overcloud-ceph-all-2 systemd[1]: Starting Ceph OSD...",
"Feb 07 22:57:09 overcloud-ceph-all-2 docker[309924]: Error response from daemon: No such container: ceph-osd-1",
"Feb 07 22:57:09 overcloud-ceph-all-2 docker[309936]: Error response from daemon: No such container: ceph-osd-1",
"Feb 07 22:57:09 overcloud-ceph-all-2 systemd[1]: Started Ceph OSD.",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: restorecon /var/lib/ceph/osd/ceph-1",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a --path /var/lib/ceph/osd/ceph-1",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: ln -snf /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a /var/lib/ceph/osd/ceph-1/block",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: chown -R ceph:ceph /dev/mapper/ceph--72de8451--9a9c--4462--b5af--59e442f225fa-osd--data--fe82915e--4d28--4744--988a--bd4250caa54a",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: --> ceph-volume lvm activate successful for osd ID: 1",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: 2019-02-07 22:57:32 /entrypoint.sh: SUCCESS",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: exec: PID 310325: spawning /usr/bin/ceph-osd --cluster ceph -f -i 1",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: exec: Waiting 310325 to quit",
"Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal",
"Feb 07 22:57:33 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: 2019-02-07 22:57:33.370792 7fb551040d80 -1 osd.1 100 log_to_monitors {default=true}"
]
}
|
Description of problem: FFU: ceph upgrade fails during the fast forward process with "Error response from daemon: No such container: ceph-create-keys" Version-Release number of selected component (if applicable): ceph-ansible-3.1.0-0.1.rc8.el7cp.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10 with 3 controllers + 2 compute + 3 ceph osd nodes 2. Run through the fast forward upgrade procedure 3. Run the ceph upgrade step: openstack overcloud ceph-upgrade run \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack qe-Cloud-0 \ -e /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/sahara.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/cli_opts_params.yaml \ -e /home/stack/ceph-ansible-env.yaml \ --ceph-ansible-playbook '/usr/share/ceph-ansible/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml,/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml' \ --container-registry-file /home/stack/virt/docker-images.yaml \ Actual results: Fails Expected results: Completes fine. Additional info: Attaching ceph-install-workflow.log.