Bug 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario
Summary: ceph-validate : devices are not validated in non-collocated and lvm_batch sce...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible
Version: 3.2
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: rc
: 3.3
Assignee: Guillaume Abrioux
QA Contact: Vasishta
URL:
Whiteboard:
Depends On:
Blocks: 1629656 1630975 1719023
TreeView+ depends on / blocked
 
Reported: 2018-11-09 01:43 UTC by Vasishta
Modified: 2019-06-10 19:47 UTC (History)
14 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.17-1.el7cp Ubuntu: ceph-ansible_3.2.17-2redhat1
Doc Type: Bug Fix
Doc Text:
.The values passed into devices in `osds.yml` are now validated Previously in the `osds.yml` of the Ansible playbook, the values passed into the `devices` parameter were not validated. This caused errors when `ceph-disk`, `parted`, or other device preparation tools failed to operate on devices that did not exist. It also caused errors if the number of values passed into the `dedicated_devices` parameter was not equal to the number of values passed into `devices`. With this update, the values are validated as expected, and none of the above mentioned errors occur.
Clone Of:
: 1719023 (view as bug list)
Environment:
Last Closed: 2019-06-10 19:47:51 UTC
Target Upstream Version:


Attachments (Terms of Use)
File contains playbook log (1.12 MB, text/plain)
2018-11-09 01:43 UTC, Vasishta
no flags Details
File contains playbook log - validation of devices in lvm scenario (111.97 KB, text/plain)
2019-04-25 07:22 UTC, Vasishta
no flags Details
File contains playbook log - validation of dedicated_devices (113.44 KB, text/plain)
2019-04-25 07:27 UTC, Vasishta
no flags Details


Links
System ID Priority Status Summary Last Updated
Github /ceph ceph-ansible pull 4066 None None None 2019-06-07 08:53:57 UTC
Red Hat Product Errata RHSA-2019:0911 None None None 2019-04-30 15:57:00 UTC
Github ceph ceph-ansible pull 3354 None None None 2018-11-22 16:32:50 UTC
Github ceph ceph-ansible pull 3661 None None None 2019-03-01 07:48:25 UTC

Description Vasishta 2018-11-09 01:43:54 UTC
Created attachment 1503537 [details]
File contains playbook log

Description of problem:
While trying negative scenarios, tried providing unequal numbers of devices and dedicated_devices expecting ceph-validate to fail playbook [1]

But ceph-validate task "include check_devices.yml"[2] skipped which resulted in failure of task "ceph-osd : manually prepare ceph "bluestore" non-containerized osd disk(s) with a dedicated device for db and wal" saying "ceph-disk prepare: error: argument --block.db: expected one argument"[3]

Version-Release number of selected component (if applicable):
3.2.0~rc1-2redhat1

How reproducible:
Always

Steps to Reproduce:
1. Make mismatch in number of devices and dedicated devices
2. set osd_scenario as non-collocated
3. Run playbook

Actual results:
[2] 
2018-11-08 15:02:45,545 p=109580 u=ubuntu |  TASK [ceph-validate : include check_devices.yml] 
...
2018-11-08 15:02:45,628 p=109580 u=ubuntu |  skipping: [magna042] => {
    "changed": false,
    "skip_reason": "Conditional result was False"
}

[3]
 TASK [ceph-osd : prepare ceph "bluestore" non-containerized osd disk(s) non-collocated] 
"module_args": {
            "_raw_params": "ceph-disk prepare --cluster ceph --bluestore --dmcrypt --block.db  --block.wal  /dev/sdc",
...
"msg": "non-zero return code",
....
"ceph-disk prepare: error: argument --block.db: expected one argument"

Expected results:
ceph-validate should have failed playbook with suitable error message

Additional info:
Tried osd config (Negative approach) - 
$ cat /usr/share/ceph-ansible/group_vars/osds.yml| egrep -v ^# | grep -v ^$
---
dummy:
dmcrypt: true
osd_objectstore: bluestore 
osd_scenario: non-collocated
devices:
   - /dev/sdb
   - /dev/sdc
   - /dev/sdd
dedicated_devices:
   - /dev/sdd


[1] Expected as we had below lines in check_devices.yml - https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-validate/tasks/check_devices.yml#L33-L38

[2] Condition which might be resulting in missing of check_devices tasks - https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-validate/tasks/main.yml#L54

Comment 3 Vasishta 2018-11-09 17:31:05 UTC
Hi,

Changes from https://github.com/ceph/ceph-ansible/pull/3319/commits/aa5a6be48b8eb7857bab1a424cc758b2134f63e7 is working fine for me.

(With above changes, tasks in check_devices are included in non-collocated scenario and lvm_batch scenarios)

Comment 4 leseb 2018-11-12 10:15:19 UTC
Thansk for working on a fix Vasishta.

Comment 5 Vasishta 2018-12-03 09:18:15 UTC
Hi Drew,

We found out that the devices are not being validated in collocated scenario which concludes that 'devices' are not validated in anycase.

We tried some typos in list of 'devices' and the playbook failed in task 
 "ceph-osd : read information about the devices" with message "msg": "Error while getting device information with parted script: '/sbin/parted -s -m /home/ubuntu/abcd -- unit 'MiB' print'"

I think this BZ should be considered as blocker for 1630975 as this because of this BZ, one of the prominent variable is not being validated.
 
Please let me know your views.

Comment 6 Vasishta 2018-12-04 05:50:41 UTC
With the fix in PR 3354, 'devices' will be validated but I think the fix is partial.

partial -
The task ceph-validate : validate devices is actually a device task fails with message -  "msg": "Error while getting device information with parted script: '/sbin/parted -s -m /dev/abcd -- unit 'MiB' print'",

Based on my assumption, the task "fail if one of the devices is not a device" must fail with error message "is not a block special file!".

Comment 9 John Brier 2018-12-17 21:15:02 UTC
Thanks for updating the Doc Text Vasishta.

Can you tell me why you didn't say no values passed into devices are validated like you mentioned in comment #5?

I updated the Doc Text as if devices are never validated. Here is your old Doc Text for reference if we need to go back to something like what you said:

Cause: 
The variable 'devices' are not validated in lvm and non-collocated osd scenarios as respective ceph-validate tasks gets skipped.

Consequence: 
Playbook completes ceph-validate tasks even when user provides incorrect value for the variable 'devices' which is not expected.

Workaround (if any): 
User must verify the value initialized for devices before initiating playbook to avoid incomplete cluster configuration during ceph-osd tasks

Result: 
Though devices are are not validated during ceph-validate tasks, incomplete configuration can be avoided.

Comment 10 Vasishta 2018-12-18 03:58:33 UTC
Hi John,

(In reply to John Brier from comment #9)

> Can you tell me why you didn't say no values passed into devices are
> validated like you mentioned in comment #5?

Oops, Sorry
I had forgot about comment 5 while trying to filling doctext. Yes validation of 'devices' is getting skipped in all scenarios because of [1]
I just considered title f BZ while formulating doctext, Thanks for reminding and correcting.

Regards,
Vasishta Shastry
QE, Ceph


[1] https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-validate/tasks/main.yml#L54

Comment 11 John Brier 2018-12-18 15:22:34 UTC
(In reply to Vasishta from comment #10)
> 
> Oops, Sorry
> I had forgot about comment 5 while trying to filling doctext. Yes validation
> of 'devices' is getting skipped in all scenarios because of [1]
> I just considered title f BZ while formulating doctext, Thanks for reminding
> and correcting.
> 

No problem. I already wrote the Doc Text as if it didn't validate any of the devices. I just wanted to make sure that was right.

Comment 17 Vasishta 2019-04-16 04:02:27 UTC
Hi,

Though devices are being validated in non-collocated and lvm-batch scenarios now, the playbook is failing in non-designated task.

I think the playbook is expected to fail at task [3] and [4] for devices and dedicated devices respectively but it is failing at [1] and [2].
Moving back to ASSIGNED state and As the basic functionality is working now and existing issue is just a cosmetic issue, changing severity to *low*.


[1] https://github.com/ceph/ceph-ansible/blob/ede2773ecc62c3e5aa07343dc355a10bc07db15a/roles/ceph-validate/tasks/check_devices.yml#L4

[2] https://github.com/ceph/ceph-ansible/blob/ede2773ecc62c3e5aa07343dc355a10bc07db15a/roles/ceph-validate/tasks/check_devices.yml#L20

[3] https://github.com/ceph/ceph-ansible/blob/ede2773ecc62c3e5aa07343dc355a10bc07db15a/roles/ceph-validate/tasks/check_devices.yml#L11

[4] https://github.com/ceph/ceph-ansible/blob/ede2773ecc62c3e5aa07343dc355a10bc07db15a/roles/ceph-validate/tasks/check_devices.yml#L29


Regards,
Vasishta Shastry
QE, Ceph

Comment 20 Vasishta 2019-04-25 07:22:54 UTC
Created attachment 1558487 [details]
File contains playbook log - validation of devices in lvm scenario

Comment 21 Vasishta 2019-04-25 07:27:21 UTC
Created attachment 1558488 [details]
File contains playbook log - validation of dedicated_devices

$ rpm -qa|grep ceph-ansible
ceph-ansible-3.2.13-1.el7cp.noarch

Comment 23 errata-xmlrpc 2019-04-30 15:56:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911


Note You need to log in before you can comment on or make changes to this bug.