Bug 1648168

Summary: ceph-validate : devices are not validated in non-collocated and lvm_batch scenario
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED NEXTRELEASE QA Contact: Vasishta <vashastr>
Severity: low Docs Contact:
Priority: high    
Version: 3.2CC: anharris, aschoen, ceph-eng-bugs, edonnell, gabrioux, gmeno, hnallurv, jbrier, nthomas, pasik, sankarshan, shan, tserlin, vashastr
Target Milestone: rcKeywords: Reopened
Target Release: 3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.17-1.el7cp Ubuntu: ceph-ansible_3.2.17-2redhat1 Doc Type: Bug Fix
Doc Text:
.The values passed into devices in `osds.yml` are now validated Previously in the `osds.yml` of the Ansible playbook, the values passed into the `devices` parameter were not validated. This caused errors when `ceph-disk`, `parted`, or other device preparation tools failed to operate on devices that did not exist. It also caused errors if the number of values passed into the `dedicated_devices` parameter was not equal to the number of values passed into `devices`. With this update, the values are validated as expected, and none of the above mentioned errors occur.
Story Points: ---
Clone Of:
: 1719023 (view as bug list) Environment:
Last Closed: 2019-06-10 19:47:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629656, 1630975, 1719023    
Attachments:
Description Flags
File contains playbook log
none
File contains playbook log - validation of devices in lvm scenario
none
File contains playbook log - validation of dedicated_devices none

Description Vasishta 2018-11-09 01:43:54 UTC
Created attachment 1503537 [details]
File contains playbook log

Description of problem:
While trying negative scenarios, tried providing unequal numbers of devices and dedicated_devices expecting ceph-validate to fail playbook [1]

But ceph-validate task "include check_devices.yml"[2] skipped which resulted in failure of task "ceph-osd : manually prepare ceph "bluestore" non-containerized osd disk(s) with a dedicated device for db and wal" saying "ceph-disk prepare: error: argument --block.db: expected one argument"[3]

Version-Release number of selected component (if applicable):
3.2.0~rc1-2redhat1

How reproducible:
Always

Steps to Reproduce:
1. Make mismatch in number of devices and dedicated devices
2. set osd_scenario as non-collocated
3. Run playbook

Actual results:
[2] 
2018-11-08 15:02:45,545 p=109580 u=ubuntu |  TASK [ceph-validate : include check_devices.yml] 
...
2018-11-08 15:02:45,628 p=109580 u=ubuntu |  skipping: [magna042] => {
    "changed": false,
    "skip_reason": "Conditional result was False"
}

[3]
 TASK [ceph-osd : prepare ceph "bluestore" non-containerized osd disk(s) non-collocated] 
"module_args": {
            "_raw_params": "ceph-disk prepare --cluster ceph --bluestore --dmcrypt --block.db  --block.wal  /dev/sdc",
...
"msg": "non-zero return code",
....
"ceph-disk prepare: error: argument --block.db: expected one argument"

Expected results:
ceph-validate should have failed playbook with suitable error message

Additional info:
Tried osd config (Negative approach) - 
$ cat /usr/share/ceph-ansible/group_vars/osds.yml| egrep -v ^# | grep -v ^$
---
dummy:
dmcrypt: true
osd_objectstore: bluestore 
osd_scenario: non-collocated
devices:
   - /dev/sdb
   - /dev/sdc
   - /dev/sdd
dedicated_devices:
   - /dev/sdd


[1] Expected as we had below lines in check_devices.yml - https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-validate/tasks/check_devices.yml#L33-L38

[2] Condition which might be resulting in missing of check_devices tasks - https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-validate/tasks/main.yml#L54

Comment 3 Vasishta 2018-11-09 17:31:05 UTC
Hi,

Changes from https://github.com/ceph/ceph-ansible/pull/3319/commits/aa5a6be48b8eb7857bab1a424cc758b2134f63e7 is working fine for me.

(With above changes, tasks in check_devices are included in non-collocated scenario and lvm_batch scenarios)

Comment 4 Sébastien Han 2018-11-12 10:15:19 UTC
Thansk for working on a fix Vasishta.

Comment 5 Vasishta 2018-12-03 09:18:15 UTC
Hi Drew,

We found out that the devices are not being validated in collocated scenario which concludes that 'devices' are not validated in anycase.

We tried some typos in list of 'devices' and the playbook failed in task 
 "ceph-osd : read information about the devices" with message "msg": "Error while getting device information with parted script: '/sbin/parted -s -m /home/ubuntu/abcd -- unit 'MiB' print'"

I think this BZ should be considered as blocker for 1630975 as this because of this BZ, one of the prominent variable is not being validated.
 
Please let me know your views.

Comment 6 Vasishta 2018-12-04 05:50:41 UTC
With the fix in PR 3354, 'devices' will be validated but I think the fix is partial.

partial -
The task ceph-validate : validate devices is actually a device task fails with message -  "msg": "Error while getting device information with parted script: '/sbin/parted -s -m /dev/abcd -- unit 'MiB' print'",

Based on my assumption, the task "fail if one of the devices is not a device" must fail with error message "is not a block special file!".

Comment 9 John Brier 2018-12-17 21:15:02 UTC
Thanks for updating the Doc Text Vasishta.

Can you tell me why you didn't say no values passed into devices are validated like you mentioned in comment #5?

I updated the Doc Text as if devices are never validated. Here is your old Doc Text for reference if we need to go back to something like what you said:

Cause: 
The variable 'devices' are not validated in lvm and non-collocated osd scenarios as respective ceph-validate tasks gets skipped.

Consequence: 
Playbook completes ceph-validate tasks even when user provides incorrect value for the variable 'devices' which is not expected.

Workaround (if any): 
User must verify the value initialized for devices before initiating playbook to avoid incomplete cluster configuration during ceph-osd tasks

Result: 
Though devices are are not validated during ceph-validate tasks, incomplete configuration can be avoided.

Comment 10 Vasishta 2018-12-18 03:58:33 UTC
Hi John,

(In reply to John Brier from comment #9)

> Can you tell me why you didn't say no values passed into devices are
> validated like you mentioned in comment #5?

Oops, Sorry
I had forgot about comment 5 while trying to filling doctext. Yes validation of 'devices' is getting skipped in all scenarios because of [1]
I just considered title f BZ while formulating doctext, Thanks for reminding and correcting.

Regards,
Vasishta Shastry
QE, Ceph


[1] https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-validate/tasks/main.yml#L54

Comment 11 John Brier 2018-12-18 15:22:34 UTC
(In reply to Vasishta from comment #10)
> 
> Oops, Sorry
> I had forgot about comment 5 while trying to filling doctext. Yes validation
> of 'devices' is getting skipped in all scenarios because of [1]
> I just considered title f BZ while formulating doctext, Thanks for reminding
> and correcting.
> 

No problem. I already wrote the Doc Text as if it didn't validate any of the devices. I just wanted to make sure that was right.

Comment 17 Vasishta 2019-04-16 04:02:27 UTC
Hi,

Though devices are being validated in non-collocated and lvm-batch scenarios now, the playbook is failing in non-designated task.

I think the playbook is expected to fail at task [3] and [4] for devices and dedicated devices respectively but it is failing at [1] and [2].
Moving back to ASSIGNED state and As the basic functionality is working now and existing issue is just a cosmetic issue, changing severity to *low*.


[1] https://github.com/ceph/ceph-ansible/blob/ede2773ecc62c3e5aa07343dc355a10bc07db15a/roles/ceph-validate/tasks/check_devices.yml#L4

[2] https://github.com/ceph/ceph-ansible/blob/ede2773ecc62c3e5aa07343dc355a10bc07db15a/roles/ceph-validate/tasks/check_devices.yml#L20

[3] https://github.com/ceph/ceph-ansible/blob/ede2773ecc62c3e5aa07343dc355a10bc07db15a/roles/ceph-validate/tasks/check_devices.yml#L11

[4] https://github.com/ceph/ceph-ansible/blob/ede2773ecc62c3e5aa07343dc355a10bc07db15a/roles/ceph-validate/tasks/check_devices.yml#L29


Regards,
Vasishta Shastry
QE, Ceph

Comment 20 Vasishta 2019-04-25 07:22:54 UTC
Created attachment 1558487 [details]
File contains playbook log - validation of devices in lvm scenario

Comment 21 Vasishta 2019-04-25 07:27:21 UTC
Created attachment 1558488 [details]
File contains playbook log - validation of dedicated_devices

$ rpm -qa|grep ceph-ansible
ceph-ansible-3.2.13-1.el7cp.noarch

Comment 23 errata-xmlrpc 2019-04-30 15:56:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911