Description of problem: The workflow to activate on boot relies on trying to mount a few times if a volume doesn't come up when the exit status is non-zero. Activate is ignoring this by not checking the exit status, which makes the workflow not try again and volumes not get activated when rebooting How reproducible: Not always, must reboot on a system that is not very fast Actual results: [root@ceph-osd0 ceph]# tail ceph-volume-systemd.log [2017-08-09 12:07:19,369][systemd][INFO ] raw systemd input received: lvm-0-8138fb63-affc-4aae-b784-346b86d09439 [2017-08-09 12:07:19,369][systemd][INFO ] parsed sub-command: lvm, extra data: 0-8138fb63-affc-4aae-b784-346b86d09439 [2017-08-09 12:07:19,369][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 0-8138fb63-affc-4aae-b784-346b86d09439 [2017-08-09 12:07:25,627][ceph_volume.process][INFO ] stdout Running command: sudo lvs -o lv_tags,lv_path,lv_name,vg_name --reportformat=json Running command: sudo mount -v /dev/test_group/test_volume /var/lib/ceph/osd/ceph-0 stderr: mount: special device /dev/test_group/test_volume does not exist Running command: chown -R ceph:ceph /dev/sdc Running command: sudo systemctl enable ceph-volume@lvm-0-8138fb63-affc-4aae-b784-346b86d09439 Running command: sudo systemctl start ceph-osd@0 [2017-08-09 12:07:25,683][systemd][INFO ] successfully trggered activation for: 0-8138fb63-affc-4aae-b784-346b86d09439 Expected results: The osd is actually mounted and started
Merged to upstream master as part of pull request: https://github.com/ceph/ceph/pull/16919 Relevant commits: ceph-volume: lvm activate should not ignore exit status codes c866123017a1defac249bebe76cc7bbaddf3cf67 ceph-volume: util add a helper to check if device is mounted d77d86aae11fba01834bb8d60633f3f49126c783 ceph-volume: lvm activate should check if the device is mounted to prevent errors from mount c61aea41f1d07b824e169bf12328b7eb0055e23f
At this point luminous is permanently branched from master (http://marc.info/?l=ceph-devel&m=150212189321868&w=2) We need a PR to the luminous branch with the appropriate cherry-picks in order for this to land in v12.2.0 upstream.
PR targeting luminous: https://github.com/ceph/ceph/pull/16970/
Configured ceph-volume OSD's and rebooted machine for 15 to 20 times, in which 3 times machine boot was slow upto 30-45 sec. Not observed any problem. OSD's came up without any issues. Based on above data, moving this bug to verified state if anything otherwise let me know. Verified in build: ceph version 12.2.1-39.el7cp (22e26be5a4920c95c43f647b31349484f663e4b9) luminous (stable)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387