Description of problem: - The customer was performing the RHCS upgrade from RHCS 4.3 to RHCS 5.1 and the upgrade process was successful except for running "cephadm-adopt.yml". - The running of "cephadm-adopt.yml" playbook got failed with failed "TASK [adopt osd daemon]". - The "TASK [adopt osd daemon]" got failed due to the ceph-osd services failing to start. - From ansible log of the first "cephadm-adopt.yml" run: ~~~ Non-zero exit code 1 from systemctl start ceph-1b148142-f71b-4e49-9e99-1b9c506655aa systemctl: stderr Job for ceph-1b148142-f71b-4e49-9e99-1b9c506655aa.service failed because the control process exited with error code. <<< systemctl: stderr See "systemctl status ceph-1b148142-f71b-4e49-9e99-1b9c506655aa.service" and "journalctl -xe" for details. Traceback (most recent call last): File "/sbin/cephadm", line 8826, in <module> main() File "/sbin/cephadm", line 8814, in main r = ctx.func(ctx) File "/sbin/cephadm", line 1941, in _default_image return func(ctx) File "/sbin/cephadm", line 5533, in command_adopt command_adopt_ceph(ctx, daemon_type, daemon_id, fsid) File "/sbin/cephadm", line 5738, in command_adopt_ceph osd_fsid=osd_fsid) File "/sbin/cephadm", line 3134, in deploy_daemon_units call_throws(ctx, ['systemctl', 'start', unit_name]) File "/sbin/cephadm", line 1619, in call_throws raise RuntimeError('Failed command: %s' % ' '.join(command)) RuntimeError: Failed command: systemctl start ceph-1b148142-f71b-4e49-9e99-1b9c506655aa <<< ~~~ - When the "TASK [adopt osd daemon]" executed last time the message was like below(Customer has executed cephadm-adopt.yml several times) ~~~ stdout: osd.101 is already adopted ~~~ - This is same for all other OSDs in the node. By this, I believe ceph-osd are already adopted by cephadm but the ceph-osd daemons are failed to start by systemctl. - All the OSDs are in one node is in down state and systemd status of the ceph-osd daemon is failed. Traceback log of ceph-osd: ~~~ Jul 04 15:18:58 node systemd[1]: Starting Ceph osd.142 for 1b148142-f71b-4e49-9e99-1b9c506655aa... Jul 04 15:19:00 node bash[6583]: Traceback (most recent call last): Jul 04 15:19:00 node bash[6583]: File "/usr/sbin/ceph-volume", line 11, in <module> Jul 04 15:19:00 node bash[6583]: load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')() Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__ Jul 04 15:19:00 node bash[6583]: self.main(self.argv) Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc Jul 04 15:19:00 node bash[6583]: return f(*a, **kw) Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 152, in main Jul 04 15:19:00 node bash[6583]: terminal.dispatch(self.mapper, subcommand_args) Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch Jul 04 15:19:00 node bash[6583]: instance.main() Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main Jul 04 15:19:00 node bash[6583]: terminal.dispatch(self.mapper, self.argv) Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch Jul 04 15:19:00 node bash[6583]: instance.main() Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 377, in main Jul 04 15:19:00 node bash[6583]: self.activate(args) Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root Jul 04 15:19:00 node bash[6583]: return func(*a, **kw) Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 301, in activate Jul 04 15:19:00 node bash[6583]: activate_bluestore(lvs, args.no_systemd) Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 157, in activate_bluestore Jul 04 15:19:00 node bash[6583]: configuration.load() Jul 04 15:19:00 node bash[6583]: File "/usr/lib/python3.6/site-packages/ceph_volume/configuration.py", line 51, in load Jul 04 15:19:00 node bash[6583]: raise exceptions.ConfigurationError(abspath=abspath) Jul 04 15:19:00 node bash[6583]: ceph_volume.exceptions.ConfigurationError: Unable to load expected Ceph config at: /etc/ceph/cephdev.conf <<<< Jul 04 15:19:02 node systemd[1]: ceph-1b148142-f71b-4e49-9e99-1b9c506655aa.service: Control process exited, code=exited status=1 Jul 04 15:19:03 node systemd[1]: ceph-1b148142-f71b-4e49-9e99-1b9c506655aa.service: Failed with result 'exit-code'. Jul 04 15:19:03 node systemd[1]: Failed to start Ceph osd.142 for 1b148142-f71b-4e49-9e99-1b9c506655aa. Jul 04 15:19:13 node systemd[1]: ceph-1b148142-f71b-4e49-9e99-1b9c506655aa.service: Service RestartSec=10s expired, scheduling restart. Jul 04 15:19:13 node systemd[1]: ceph-1b148142-f71b-4e49-9e99-1b9c506655aa.service: Scheduled restart job, restart counter is at 1. Jul 04 15:19:13 node systemd[1]: Stopped Ceph osd.142 for 1b148142-f71b-4e49-9e99-1b9c506655aa. ~~~ Version-Release number of selected component (if applicable): Red Hat Ceph Storage 5.1z2 - 5.1.2 ceph version 16.2.7-126.el8cp
*** This bug has been marked as a duplicate of bug 2058038 ***