Bug 1479797 - do not ignore non-zero exit status when activating
Summary: do not ignore non-zero exit status when activating
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Volume
Version: 3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 3.0
Assignee: Alfredo Deza
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-09 12:18 UTC by Alfredo Deza
Modified: 2017-12-05 23:39 UTC (History)
7 users (show)

Fixed In Version: RHEL: ceph-12.1.4-1.el7cp Ubuntu: ceph_12.1.4-2redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-05 23:39:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 16970 0 None None None 2017-08-10 16:09:28 UTC
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Description Alfredo Deza 2017-08-09 12:18:13 UTC
Description of problem: The workflow to activate on boot relies on trying to mount a few times if a volume doesn't come up when the exit status is non-zero. Activate is ignoring this by not checking the exit status, which makes the workflow not try again and volumes not get activated when rebooting


How reproducible: Not always, must reboot on a system that is not very fast


Actual results:
[root@ceph-osd0 ceph]# tail ceph-volume-systemd.log
[2017-08-09 12:07:19,369][systemd][INFO  ] raw systemd input received: lvm-0-8138fb63-affc-4aae-b784-346b86d09439
[2017-08-09 12:07:19,369][systemd][INFO  ] parsed sub-command: lvm, extra data: 0-8138fb63-affc-4aae-b784-346b86d09439
[2017-08-09 12:07:19,369][ceph_volume.process][INFO  ] Running command: ceph-volume lvm trigger 0-8138fb63-affc-4aae-b784-346b86d09439
[2017-08-09 12:07:25,627][ceph_volume.process][INFO  ] stdout Running command: sudo lvs -o lv_tags,lv_path,lv_name,vg_name --reportformat=json
Running command: sudo mount -v /dev/test_group/test_volume /var/lib/ceph/osd/ceph-0
 stderr: mount: special device /dev/test_group/test_volume does not exist
Running command: chown -R ceph:ceph /dev/sdc
Running command: sudo systemctl enable ceph-volume@lvm-0-8138fb63-affc-4aae-b784-346b86d09439
Running command: sudo systemctl start ceph-osd@0
[2017-08-09 12:07:25,683][systemd][INFO  ] successfully trggered activation for: 0-8138fb63-affc-4aae-b784-346b86d09439


Expected results: The osd is actually mounted and started

Comment 2 Alfredo Deza 2017-08-10 11:59:05 UTC
Merged to upstream master as part of pull request:

    https://github.com/ceph/ceph/pull/16919

Relevant commits:

ceph-volume: lvm activate should not ignore exit status codes
c866123017a1defac249bebe76cc7bbaddf3cf67

ceph-volume: util add a helper to check if device is mounted
d77d86aae11fba01834bb8d60633f3f49126c783

ceph-volume: lvm activate should check if the device is mounted to prevent errors from  mount
c61aea41f1d07b824e169bf12328b7eb0055e23f

Comment 3 Ken Dreyer (Red Hat) 2017-08-10 14:43:14 UTC
At this point luminous is permanently branched from master (http://marc.info/?l=ceph-devel&m=150212189321868&w=2)

We need a PR to the luminous branch with the appropriate cherry-picks in order for this to land in v12.2.0 upstream.

Comment 4 Alfredo Deza 2017-08-10 15:35:34 UTC
PR targeting luminous: https://github.com/ceph/ceph/pull/16970/

Comment 9 Ramakrishnan Periyasamy 2017-11-06 11:25:50 UTC
Configured ceph-volume OSD's and rebooted machine for 15 to 20 times, in which 3 times machine boot was slow upto 30-45 sec.

Not observed any problem. OSD's came up without any issues.

Based on above data, moving this bug to verified state if anything otherwise let me know.

Verified in build:
ceph version 12.2.1-39.el7cp (22e26be5a4920c95c43f647b31349484f663e4b9) luminous (stable)

Comment 12 errata-xmlrpc 2017-12-05 23:39:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.