Description of problem: FFU: ceph upgrade fails during the fast forward process with "Error response from daemon: No such container: ceph-create-keys" Version-Release number of selected component (if applicable): ceph-ansible-3.1.0-0.1.rc8.el7cp.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10 with 3 controllers + 2 compute + 3 ceph osd nodes 2. Run through the fast forward upgrade procedure 3. Run the ceph upgrade step: openstack overcloud ceph-upgrade run \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack qe-Cloud-0 \ -e /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/sahara.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/cli_opts_params.yaml \ -e /home/stack/ceph-ansible-env.yaml \ --ceph-ansible-playbook '/usr/share/ceph-ansible/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml,/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml' \ --container-registry-file /home/stack/virt/docker-images.yaml \ Actual results: Fails Expected results: Completes fine. Additional info: Attaching ceph-install-workflow.log.
We believe this can be hit for any ceph-ansible run (including fresh deployments) with >1 compute node (or custom role behaving as ceph client, hosting a service consuming ceph).
verified on ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086
I've seen a very similar example of this problem when using Custom Roles (CephAll and ControllerNoCeph) where it doesn't find ceph-osd-1: Error response from daemon: No such container: ceph-osd-1 I see this error roughly 50% of the time when trying to do a deploy on an existing overcloud, ceph-ansible-3.2.0-1.el7cp.noarch Another noteworthy part is it's trying to unmount /dev/sda2, which is where my host is installed device list is as follows: parameter_defaults: CephAnsibleDisksConfig: osd_scenario: lvm osd_objectstore: bluestore devices: - /dev/sdb - /dev/sdc - /dev/sdd - /dev/sde - /dev/sdf - /dev/sdg - /dev/sdh - /dev/sdi - /dev/sdj - /dev/sdk - /dev/sdl - /dev/sdm - /dev/sdn - /dev/sdo - /dev/sdp - /dev/sdq - /dev/sdr - /dev/sds - /dev/sdt - /dev/sdu - /dev/sdv - /dev/sdw Full Error Below: "stdout_lines": [ "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"a95f57a637cc\\\": executable file not found in $PATH\"", "", "Socket file /var/run/ceph/ceph-osd.1.asok could not be found, which means the osd daemon is not running. Showing ceph-osd unit logs now:", "-- Logs begin at Tue 2019-02-05 07:34:21 UTC, end at Thu 2019-02-07 23:02:09 UTC. --", "Feb 05 07:59:53 overcloud-ceph-all-2 systemd[1]: Starting Ceph OSD...", "Feb 05 07:59:54 overcloud-ceph-all-2 docker[51388]: Error response from daemon: No such container: ceph-osd-1", "Feb 05 07:59:54 overcloud-ceph-all-2 docker[51402]: Error response from daemon: No such container: ceph-osd-1", "Feb 05 07:59:54 overcloud-ceph-all-2 systemd[1]: Started Ceph OSD.", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: restorecon /var/lib/ceph/osd/ceph-1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a --path /var/lib/ceph/osd/ceph-1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: ln -snf /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a /var/lib/ceph/osd/ceph-1/block", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: chown -R ceph:ceph /dev/mapper/ceph--72de8451--9a9c--4462--b5af--59e442f225fa-osd--data--fe82915e--4d28--4744--988a--bd4250caa54a", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: --> ceph-volume lvm activate successful for osd ID: 1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-05 08:00:21 /entrypoint.sh: SUCCESS", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: exec: PID 57921: spawning /usr/bin/ceph-osd --cluster ceph -f -i 1", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: exec: Waiting 57921 to quit", "Feb 05 08:00:21 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal", "Feb 05 08:00:22 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-05 08:00:22.410304 7f9197bdfd80 -1 osd.1 0 log_to_monitors {default=true}", "Feb 05 08:00:23 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-05 08:00:23.907197 7f917fbe6700 -1 osd.1 0 waiting for initial osdmap", "Feb 07 22:57:06 overcloud-ceph-all-2 systemd[1]: Stopping Ceph OSD...", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: managing teardown after SIGTERM", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: Sending SIGTERM to PID 57921", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: Waiting PID 57921 to terminate .2019-02-07 22:57:06.572019 7f9175bd2700 -1 Fail to read '/proc/309854/cmdline' error = (3) No such process", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:06.572054 7f9175bd2700 -1 received signal: Terminated from PID: 309854 task name: <unknown> UID: 0", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:06.572071 7f9175bd2700 -1 osd.1 100 *** Got signal Terminated ***", "Feb 07 22:57:06 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: .2019-02-07 22:57:06.694787 7f9175bd2700 -1 osd.1 100 shutdown", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: ..........................", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: teardown: Process 57921 is terminated", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: sigterm_cleanup_post", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:09 /entrypoint.sh: osd_volume_activate: Unmounting /dev/sda2", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: umount: /var/lib/ceph: target is busy.", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: (In some cases useful info about processes that use", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: the device is found by lsof(8) or fuser(1))", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: 2019-02-07 22:57:09 /entrypoint.sh: osd_volume_activate: Failed to umount /dev/sda2", "Feb 07 22:57:09 overcloud-ceph-all-2 ceph-osd-run.sh[51416]: osd_volume_activate.sh: line 47: lsof: command not found", "Feb 07 22:57:09 overcloud-ceph-all-2 docker[309842]: ceph-osd-1", "Feb 07 22:57:09 overcloud-ceph-all-2 systemd[1]: Stopped Ceph OSD.", "Feb 07 22:57:09 overcloud-ceph-all-2 systemd[1]: Starting Ceph OSD...", "Feb 07 22:57:09 overcloud-ceph-all-2 docker[309924]: Error response from daemon: No such container: ceph-osd-1", "Feb 07 22:57:09 overcloud-ceph-all-2 docker[309936]: Error response from daemon: No such container: ceph-osd-1", "Feb 07 22:57:09 overcloud-ceph-all-2 systemd[1]: Started Ceph OSD.", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: restorecon /var/lib/ceph/osd/ceph-1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a --path /var/lib/ceph/osd/ceph-1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: ln -snf /dev/ceph-72de8451-9a9c-4462-b5af-59e442f225fa/osd-data-fe82915e-4d28-4744-988a-bd4250caa54a /var/lib/ceph/osd/ceph-1/block", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: chown -R ceph:ceph /dev/mapper/ceph--72de8451--9a9c--4462--b5af--59e442f225fa-osd--data--fe82915e--4d28--4744--988a--bd4250caa54a", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: --> ceph-volume lvm activate successful for osd ID: 1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: 2019-02-07 22:57:32 /entrypoint.sh: SUCCESS", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: exec: PID 310325: spawning /usr/bin/ceph-osd --cluster ceph -f -i 1", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: exec: Waiting 310325 to quit", "Feb 07 22:57:32 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal", "Feb 07 22:57:33 overcloud-ceph-all-2 ceph-osd-run.sh[309949]: 2019-02-07 22:57:33.370792 7fb551040d80 -1 osd.1 100 log_to_monitors {default=true}" ] }